wombatu-kun opened a new pull request, #16655:
URL: https://github.com/apache/iceberg/pull/16655
When routing by a field (static routing with `iceberg.tables.route-field`,
or dynamic routing), `SinkWriter` extracted the route value for every record
via `RecordUtils.extractFromRecordValue(value, routeField)`, which re-parsed
the dotted field path with `Splitter.on('.').splitToList(routeField)` on each
call. The route field is fixed for the connector's lifetime, so this re-parse
is pure per-record overhead.
This splits the path once in the `SinkWriter` constructor and adds a
`RecordUtils.extractFromRecordValue(Object, List<String>)` overload that takes
the already-split path; the existing `String` overload now delegates to it, so
other callers are unchanged. Behavior is identical.
A throwaway A/B microbench over the whole `extractFromRecordValue` method
(2M iterations x 9 trials, median; baseline = current `String` overload that
splits per call, optimized = `List` overload with the path split once) showed:
| record value | route field | before | after | faster |
|---|---|---|---|---|
| struct | `key` | 52.8 ns | 5.7 ns | 89% |
| struct | `data.id.key` | 162.2 ns | 32.0 ns | 80% |
| map | `key` | 54.3 ns | 6.1 ns | 89% |
| map | `data.id.key` | 144.3 ns | 21.6 ns | 85% |
That is roughly 47 ns saved per record for a single-segment route field and
~120-130 ns for a three-segment path, paid once per record on the routing path.
The numbers are indicative wall-clock from a microbench, not JMH.
Existing `TestSinkWriter` and `TestRecordUtils` cover both routing modes and
the extraction overloads.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]