zanmato1984 commented on PR #44053: URL: https://github.com/apache/arrow/pull/44053#issuecomment-2344193625
> I'm curious, is the `SimpleKeySegmenter` useful for performance? If you mean if the `SimpleKeySegmenter` has better performance than using `AnyKeysSegmenter` for "simple key", then yes. The former only has to perform a linear search on the input data, as opposed to the later who uses a heavy hash table which introduces all sorts of overhead (memory alloc/insertion/lookup). In short, I think `SimpleKeySegmenter` is a worthwhile specialization for performance. I hacked the code and got the benchmark number. `SimpleKeySegmenter` (copied from my previous comment): ``` BenchmarkRowSegmenter/Rows:32768/Segments:1/SegmentKeys:1 197105 ns 196813 ns 3587 bytes_per_second=2.48093Gi/s items_per_second=166.493M/s BenchmarkRowSegmenter/Rows:32768/Segments:4/SegmentKeys:1 207379 ns 207241 ns 3384 bytes_per_second=2.3561Gi/s items_per_second=158.115M/s BenchmarkRowSegmenter/Rows:32768/Segments:16/SegmentKeys:1 254707 ns 254565 ns 2777 bytes_per_second=1.9181Gi/s items_per_second=128.721M/s BenchmarkRowSegmenter/Rows:32768/Segments:64/SegmentKeys:1 439539 ns 439415 ns 1589 bytes_per_second=1.11121Gi/s items_per_second=74.5719M/s BenchmarkRowSegmenter/Rows:32768/Segments:256/SegmentKeys:1 1225127 ns 1224517 ns 576 bytes_per_second=408.324Mi/s items_per_second=26.7599M/s ``` `AnyKeysSegmenter` (by hacking the code to use it for even "simple key"): ``` BenchmarkRowSegmenter/Rows:32768/Segments:1/SegmentKeys:1 271931 ns 271563 ns 2571 bytes_per_second=1.79804Gi/s items_per_second=120.664M/s BenchmarkRowSegmenter/Rows:32768/Segments:4/SegmentKeys:1 335488 ns 330342 ns 2196 bytes_per_second=1.47811Gi/s items_per_second=99.194M/s BenchmarkRowSegmenter/Rows:32768/Segments:16/SegmentKeys:1 510618 ns 508622 ns 1387 bytes_per_second=983.048Mi/s items_per_second=64.425M/s BenchmarkRowSegmenter/Rows:32768/Segments:64/SegmentKeys:1 847002 ns 846245 ns 827 bytes_per_second=590.845Mi/s items_per_second=38.7216M/s BenchmarkRowSegmenter/Rows:32768/Segments:256/SegmentKeys:1 1745628 ns 1744707 ns 409 bytes_per_second=286.581Mi/s items_per_second=18.7814M/s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
