0lai0 commented on PR #4364: URL: https://github.com/apache/datafusion-comet/pull/4364#issuecomment-4506911578
I added a small focused microbenchmark for the `VarCharVector` / `getUTF8String` path. On my local arm64 macOS machine with OpenJDK 17.0.18, using 1M rows, 3 warmup iterations, and 5 measured rounds: | Case | Total time | Time per row | |---|---:|---:| | BEFORE: fetch `offsetBufferAddress` on every call | 4,603,250 ns | 4.60 ns/row | | AFTER: cached `offsetBufferAddress` | 456,167 ns | 0.46 ns/row | This shows about a **10x improvement** on this tight loop. The offset buffer address is stable for the lifetime of the vector, so caching it is safe. This is a simple timing benchmark rather than full JMH, but it directly measures the targeted path. The benefit is most visible for queries that scan large string columns row-by-row, such as Parquet reads with string filters. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
