zeroshade commented on pull request #10716:
URL: https://github.com/apache/arrow/pull/10716#issuecomment-896879021


   @emkornfield The performance difference is actually entirely based on the 
data and types (which is why I left both implementations in here).
   
   My current theory is that the custom hash table implementation (which I took 
from the C++ memo table implementation) is that it's simply a case of optimized 
handling of smaller values resulting is significantly fewer allocations.
   
   With go1.16.1 on my local laptop:
   
   ```
   goos: windows
   goarch: amd64
   pkg: github.com/apache/arrow/go/parquet/internal/encoding
   cpu: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
   BenchmarkMemoTableFloat64/100_unique_n_65535/go_map-12                    
100          12464686 ns/op            7912 B/op         15 allocs/op
   BenchmarkMemoTableFloat64/100_unique_n_65535/xxh3-12                      
230           5325610 ns/op            7680 B/op          2 allocs/op
   BenchmarkMemoTableFloat64/1000_unique_n_65535/go_map-12                    
85          14819479 ns/op          123337 B/op         70 allocs/op
   BenchmarkMemoTableFloat64/1000_unique_n_65535/xxh3-12                     
186           5388082 ns/op          130560 B/op          4 allocs/op
   BenchmarkMemoTableFloat64/5000_unique_n_65535/go_map-12                    
79          16167963 ns/op          493125 B/op        195 allocs/op
   BenchmarkMemoTableFloat64/5000_unique_n_65535/xxh3-12                     
132           7631678 ns/op          523776 B/op          5 allocs/op
   BenchmarkMemoTableInt32/100_unique_n_65535/xxh3-12                        
247           4648034 ns/op            5120 B/op          2 allocs/op
   BenchmarkMemoTableInt32/100_unique_n_65535/go_map-12                      
602           1885666 ns/op            6468 B/op         17 allocs/op
   BenchmarkMemoTableInt32/1000_unique_n_65535/xxh3-12                      
1429            895664 ns/op           87040 B/op          4 allocs/op
   BenchmarkMemoTableInt32/1000_unique_n_65535/go_map-12                     
632           1893139 ns/op          104802 B/op         72 allocs/op
   BenchmarkMemoTableInt32/5000_unique_n_65535/xxh3-12                      
1020           1295938 ns/op          349184 B/op          5 allocs/op
   BenchmarkMemoTableInt32/5000_unique_n_65535/go_map-12                     
489           2457437 ns/op          420230 B/op        187 allocs/op
   BenchmarkMemoTable/100_unique_len_32-32_n_65535/xxh3-12                   
398           2990361 ns/op           16904 B/op         23 allocs/op
   BenchmarkMemoTable/100_unique_len_32-32_n_65535/go_map-12                 
428           2811322 ns/op           19799 B/op         28 allocs/op
   BenchmarkMemoTable/100_unique_len_8-32_n_65535/xxh3-12                    
356           3463212 ns/op           16904 B/op         23 allocs/op
   BenchmarkMemoTable/100_unique_len_8-32_n_65535/go_map-12                  
380           3149174 ns/op           19781 B/op         28 allocs/op
   BenchmarkMemoTable/1000_unique_len_32-32_n_65535/xxh3-12                  
366           3188208 ns/op          176200 B/op         32 allocs/op
   BenchmarkMemoTable/1000_unique_len_32-32_n_65535/go_map-12                
361           9588561 ns/op          211407 B/op         67 allocs/op
   BenchmarkMemoTable/1000_unique_len_8-32_n_65535/xxh3-12                   
336           3529636 ns/op          176200 B/op         32 allocs/op
   BenchmarkMemoTable/1000_unique_len_8-32_n_65535/go_map-12                 
326           3676726 ns/op          211358 B/op         67 allocs/op
   BenchmarkMemoTable/5000_unique_len_32-32_n_65535/xxh3-12                  
252           4702015 ns/op          992584 B/op         42 allocs/op
   BenchmarkMemoTable/5000_unique_len_32-32_n_65535/go_map-12                
255           4884533 ns/op         1131141 B/op        181 allocs/op
   BenchmarkMemoTable/5000_unique_len_8-32_n_65535/xxh3-12                   
235           4814247 ns/op          722248 B/op         41 allocs/op
   BenchmarkMemoTable/5000_unique_len_8-32_n_65535/go_map-12                 
244           5340692 ns/op          860673 B/op        179 allocs/op
   BenchmarkMemoTableAllUnique/values_1024_len_32-32/go_map-12              
4509            271040 ns/op          211432 B/op         67 allocs/op
   BenchmarkMemoTableAllUnique/values_1024_len_32-32/xxh3-12                
8245            150411 ns/op          176200 B/op         32 allocs/op
   BenchmarkMemoTableAllUnique/values_1024_len_8-32/go_map-12               
4552            255188 ns/op          211443 B/op         67 allocs/op
   BenchmarkMemoTableAllUnique/values_1024_len_8-32/xxh3-12                 
9828            148377 ns/op          176200 B/op         32 allocs/op
   BenchmarkMemoTableAllUnique/values_32767_len_32-32/go_map-12              
100          11416073 ns/op         6324082 B/op       1176 allocs/op
   BenchmarkMemoTableAllUnique/values_32767_len_32-32/xxh3-12                
222           5578530 ns/op         3850569 B/op         49 allocs/op
   BenchmarkMemoTableAllUnique/values_32767_len_8-32/go_map-12               
100          11123497 ns/op         6323152 B/op       1171 allocs/op
   BenchmarkMemoTableAllUnique/values_32767_len_8-32/xxh3-12                 
212           6094342 ns/op         3850569 B/op         49 allocs/op
   BenchmarkMemoTableAllUnique/values_65535_len_32-32/go_map-12               
44          25062816 ns/op        12580560 B/op       2384 allocs/op
   BenchmarkMemoTableAllUnique/values_65535_len_32-32/xxh3-12                 
74          15704849 ns/op        10430028 B/op         53 allocs/op
   BenchmarkMemoTableAllUnique/values_65535_len_8-32/go_map-12                
40          26667808 ns/op        12580008 B/op       2381 allocs/op
   BenchmarkMemoTableAllUnique/values_65535_len_8-32/xxh3-12                  
81          14417075 ns/op        10430024 B/op         53 allocs/op
   ```
   The ones labeled xxh3 are my custom implementation and the go_map is the go 
builtin map based implementation, if you look closely at the results of the 
benchmark, in *most* cases, the xxh3 based implementation is fairly 
significantly faster, sometimes even twice as fast, with significantly fewer 
allocations per loop (for example, in the binary case with all unique values 
you can see the 2384 allocations in the go map based implementation vs the 53 
in my custom implementation, having a ~37% performance improvement over the 
go-map based implementation.
   
   But if we look at some other cases, for example the 
`100_unique_len_32-32_n_65535` and `100_unique_len_8-32_n_65535` cases, which 
correspond to 65535 binary strings of length 32 or of lengths between 8 and 32 
with exactly 100 unique values among them, we see that the go map based 
implementation is actually slightly more performant despite a few more 
allocations. The same thing happens with the Int32 memotable with only 100 
unique values over 65535 values, but when we increase to 1000 unique values or 
5000 unique values my custom one seems to do better. Which seems to indicate 
that the builtin go map is faster in cases with lower cardinality of unique 
values, while my custom implementation is more performant for inserting new 
values. Except in the Float64 case, where my custom implementation is faster in 
all cases even in the 100 unique value case, which I attribute to the faster 
hashing of smaller byte values via xxh3.
   
   **TL;DR**: All in all, in *most* cases, the custom implementation is faster 
but in some cases with lower cardinality of unique values (specifically int32 
and binary strings) the implementation using go's map as the basis can be more 
performant. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to