wombatu-kun opened a new pull request, #16654: URL: https://github.com/apache/iceberg/pull/16654
`RecordConverter.convertUUID` recomputed `FileFormat.PARQUET.name().toLowerCase(Locale.ROOT).equals(config.writeProps().get(DEFAULT_FILE_FORMAT))` for every UUID-typed value. The write file format is fixed for the converter's lifetime (`writeProps` is set once on the config), so this boolean is constant, yet `enum.name()` + `toLowerCase` allocated a fresh `"parquet"` String on every call, plus a map lookup and an equals. This resolves the flag once in the constructor (`writeUuidAsBytes`), reducing `convertUUID` to a field read. Behavior is unchanged: the same 16-byte representation is returned for Parquet and the same UUID otherwise. A throwaway A/B microbench over the whole `convertUUID` method (2M iterations x 9 trials, median; baseline mirrors the current inline expression, optimized uses the precomputed boolean) showed the per-value cost drop: | input | format | before | after | faster | |---|---|---|---|---| | String | parquet | 53.6 ns | 32.5 ns | 39% | | String | orc | 46.1 ns | 26.1 ns | 43% | | UUID | parquet | 32.8 ns | 5.9 ns | 82% | | UUID | orc | 22.3 ns | 2.4 ns | 89% | That is roughly 20-27 ns saved per UUID value, about 40% of the method on String inputs (the common Kafka Connect case). The numbers are indicative wall-clock from a microbench, not JMH. Existing `TestRecordConverter` covers the conversion (including `testUUIDConversionWithParquet`); its mock now defaults `writeProps()` to an empty map to mirror production, where it is never null. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
