zanmato1984 commented on PR #44053:
URL: https://github.com/apache/arrow/pull/44053#issuecomment-2344193625

   > I'm curious, is the `SimpleKeySegmenter` useful for performance?
   
   If you mean if the `SimpleKeySegmenter` has better performance than using 
`AnyKeysSegmenter` for "simple key", then yes. The former only has to perform a 
linear search on the input data, as opposed to the later who uses a heavy hash 
table which introduces all sorts of overhead (memory alloc/insertion/lookup). 
In short, I think `SimpleKeySegmenter` is a worthwhile specialization for 
performance.
   
   I hacked the code and got the benchmark number.
   
   `SimpleKeySegmenter` (copied from my previous comment):
   ```
   BenchmarkRowSegmenter/Rows:32768/Segments:1/SegmentKeys:1       197105 ns    
   196813 ns         3587 bytes_per_second=2.48093Gi/s 
items_per_second=166.493M/s
   BenchmarkRowSegmenter/Rows:32768/Segments:4/SegmentKeys:1       207379 ns    
   207241 ns         3384 bytes_per_second=2.3561Gi/s 
items_per_second=158.115M/s
   BenchmarkRowSegmenter/Rows:32768/Segments:16/SegmentKeys:1      254707 ns    
   254565 ns         2777 bytes_per_second=1.9181Gi/s 
items_per_second=128.721M/s
   BenchmarkRowSegmenter/Rows:32768/Segments:64/SegmentKeys:1      439539 ns    
   439415 ns         1589 bytes_per_second=1.11121Gi/s 
items_per_second=74.5719M/s
   BenchmarkRowSegmenter/Rows:32768/Segments:256/SegmentKeys:1    1225127 ns    
  1224517 ns          576 bytes_per_second=408.324Mi/s 
items_per_second=26.7599M/s
   ```
   
   `AnyKeysSegmenter` (by hacking the code to use it for even "simple key"):
   ```
   BenchmarkRowSegmenter/Rows:32768/Segments:1/SegmentKeys:1       271931 ns    
   271563 ns         2571 bytes_per_second=1.79804Gi/s 
items_per_second=120.664M/s
   BenchmarkRowSegmenter/Rows:32768/Segments:4/SegmentKeys:1       335488 ns    
   330342 ns         2196 bytes_per_second=1.47811Gi/s 
items_per_second=99.194M/s
   BenchmarkRowSegmenter/Rows:32768/Segments:16/SegmentKeys:1      510618 ns    
   508622 ns         1387 bytes_per_second=983.048Mi/s 
items_per_second=64.425M/s
   BenchmarkRowSegmenter/Rows:32768/Segments:64/SegmentKeys:1      847002 ns    
   846245 ns          827 bytes_per_second=590.845Mi/s 
items_per_second=38.7216M/s
   BenchmarkRowSegmenter/Rows:32768/Segments:256/SegmentKeys:1    1745628 ns    
  1744707 ns          409 bytes_per_second=286.581Mi/s 
items_per_second=18.7814M/s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to