wombatu-kun opened a new pull request, #16657:
URL: https://github.com/apache/iceberg/pull/16657

   `RecordConverter.convertListValue` built its result with 
`list.stream().map(...).collect(Collectors.toList())`, and `convertMapValue` 
collected into an unsized `Maps.newHashMap()`; both also recomputed the 
element/key/value field ids (`type.fields().get(...).fieldId()`) once per 
element inside the lambda.
   
   This converts both to a pre-sized loop: `convertListValue` fills a 
`Lists.newArrayListWithCapacity(list.size())` with a plain `for` loop, and 
`convertMapValue` collects into `Maps.newHashMapWithExpectedSize(map.size())`, 
with the field ids and element/key/value types hoisted out of the per-element 
body. For lists this removes the stream pipeline allocation; for maps the 
pre-sizing avoids rehashing as the map grows. Behavior is unchanged (same 
elements and order, same collection types).
   
   A throwaway A/B microbench over the whole conversion method (200k iterations 
x 9 trials, median; the private per-element `convertValue` is identical in both 
versions and is replaced by the same identity stub on both sides, so the delta 
is exactly the structural change; real `ListType`/`MapType` are used so the 
`fields().get(0).fieldId()` cost is faithful) showed:
   
   | collection | size | before | after | faster |
   |---|---|---|---|---|
   | list | 10 | 202.6 ns | 44.7 ns | 78% |
   | list | 100 | 1098 ns | 382 ns | 65% |
   | list | 1000 | 10746 ns | 3900 ns | 64% |
   | map | 10 | 190.3 ns | 164.5 ns | 14% |
   | map | 100 | 2568 ns | 1581 ns | 38% |
   | map | 1000 | 26198 ns | 15697 ns | 40% |
   
   That is roughly 6-7 ns saved per list element and ~10 ns per map entry, paid 
per list/map field per record. The numbers are wall-clock from a microbench 
(with a stubbed per-element conversion that inflates the percentages; the 
absolute per-element saving is what carries over), not JMH.
   
   Existing `TestRecordConverter` covers list and map conversion.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to