wombatu-kun opened a new pull request, #16655:
URL: https://github.com/apache/iceberg/pull/16655

   When routing by a field (static routing with `iceberg.tables.route-field`, 
or dynamic routing), `SinkWriter` extracted the route value for every record 
via `RecordUtils.extractFromRecordValue(value, routeField)`, which re-parsed 
the dotted field path with `Splitter.on('.').splitToList(routeField)` on each 
call. The route field is fixed for the connector's lifetime, so this re-parse 
is pure per-record overhead.
   
   This splits the path once in the `SinkWriter` constructor and adds a 
`RecordUtils.extractFromRecordValue(Object, List<String>)` overload that takes 
the already-split path; the existing `String` overload now delegates to it, so 
other callers are unchanged. Behavior is identical.
   
   A throwaway A/B microbench over the whole `extractFromRecordValue` method 
(2M iterations x 9 trials, median; baseline = current `String` overload that 
splits per call, optimized = `List` overload with the path split once) showed:
   
   | record value | route field | before | after | faster |
   |---|---|---|---|---|
   | struct | `key` | 52.8 ns | 5.7 ns | 89% |
   | struct | `data.id.key` | 162.2 ns | 32.0 ns | 80% |
   | map | `key` | 54.3 ns | 6.1 ns | 89% |
   | map | `data.id.key` | 144.3 ns | 21.6 ns | 85% |
   
   That is roughly 47 ns saved per record for a single-segment route field and 
~120-130 ns for a three-segment path, paid once per record on the routing path. 
The numbers are indicative wall-clock from a microbench, not JMH.
   
   Existing `TestSinkWriter` and `TestRecordUtils` cover both routing modes and 
the extraction overloads.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to