anoopj commented on code in PR #16408:
URL: https://github.com/apache/iceberg/pull/16408#discussion_r3336214895
##########
core/src/main/java/org/apache/iceberg/DeletionVectorStruct.java:
##########
@@ -51,7 +51,7 @@ private DeletionVectorStruct(DeletionVectorStruct toCopy) {
}
private DeletionVectorStruct(String location, long offset, long sizeInBytes,
long cardinality) {
Review Comment:
Thanks for running the benchmark. I suspect the I/O and Parquet decoding
costs might be masking the cost of the boxing in the readers. For instance, an
`int` is 4 bytes in memory while an `Integer` would cost 20 bytes (4 bytes to
store the value + 12 bytes `Object` overhead + 4 bytes to hold a reference).
That is a 5x difference in memory footprint. Also there are hidden costs such
as lack of cache locality, allocation overhead and GC tracking.
There are some cases where modern JVMs (like Graal) can potentially do an
escape analysis: ie if the JIT can prove a boxed wrapper doesn't escape its
method, it can elide the heap allocation. Works for ephemeral boxes in
pure-arithmetic methods. But it doesn't work when the box is stored in a field,
returned from a method, put in a collection, or crosses a megamorphic call site
(because there can't be inlining).
So for fields on a struct that gets stored in maps/lists during scan
planning, we will end up paying the full boxing tax.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]