mapleFU commented on issue #40845:
URL: https://github.com/apache/arrow/issues/40845#issuecomment-2041524939

   I write a naive bmi2 impl, in Intel Xeon:
   
   ```
   ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1   
           4729 ns         4734 ns       149748 bytes_per_second=3.18578Gi/s 
items_per_second=1.71035G/s
   ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7   
          14434 ns        14453 ns        48206 bytes_per_second=1.04339Gi/s 
items_per_second=560.166M/s
   
ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1024   
        2763 ns         2769 ns       251599 bytes_per_second=5.44634Gi/s 
items_per_second=2.92398G/s
   ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1   
           4311 ns         4316 ns       162065 bytes_per_second=3.49358Gi/s 
items_per_second=1.8756G/s
   ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1   
           4122 ns         4125 ns       169853 bytes_per_second=3.65559Gi/s 
items_per_second=1.96258G/s
   ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1   
           3886 ns         3888 ns       179847 bytes_per_second=3.8783Gi/s 
items_per_second=2.08215G/s
   ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7   
          13068 ns        13080 ns        53376 bytes_per_second=1.15291Gi/s 
items_per_second=618.963M/s
   
ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1  
        3742 ns         3748 ns       186340 bytes_per_second=4.02357Gi/s 
items_per_second=2.16014G/s
   
ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7  
        3745 ns         3750 ns       186374 bytes_per_second=4.02114Gi/s 
items_per_second=2.15883G/s
   
ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1024
       3742 ns         3747 ns       186728 bytes_per_second=4.02498Gi/s 
items_per_second=2.1609G/s
   
ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1  
        3424 ns         3429 ns       204069 bytes_per_second=4.39745Gi/s 
items_per_second=2.36086G/s
   
ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1  
        3499 ns         3504 ns       199696 bytes_per_second=4.30356Gi/s 
items_per_second=2.31045G/s
   
ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1  
        3311 ns         3315 ns       208379 bytes_per_second=4.54882Gi/s 
items_per_second=2.44213G/s
   
ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7  
        3499 ns         3506 ns       199599 bytes_per_second=4.3016Gi/s 
items_per_second=2.3094G/s
   ```
   
   Before:
   
   ```
   ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1   
           5534 ns         5541 ns       124334 bytes_per_second=2.7213Gi/s 
items_per_second=1.46099G/s
   ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7   
          16683 ns        16703 ns        42139 bytes_per_second=924.523Mi/s 
items_per_second=484.716M/s
   
ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1024   
        2594 ns         2597 ns       269554 bytes_per_second=5.80692Gi/s 
items_per_second=3.11757G/s
   ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1   
           4980 ns         4985 ns       140579 bytes_per_second=3.02534Gi/s 
items_per_second=1.62421G/s
   ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1   
           4782 ns         4786 ns       146402 bytes_per_second=3.15082Gi/s 
items_per_second=1.69158G/s
   ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1   
           4397 ns         4402 ns       159011 bytes_per_second=3.42562Gi/s 
items_per_second=1.83912G/s
   ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7   
          15721 ns        15736 ns        44453 bytes_per_second=981.292Mi/s 
items_per_second=514.48M/s
   
ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1  
        3272 ns         3274 ns       213994 bytes_per_second=4.60583Gi/s 
items_per_second=2.47273G/s
   
ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7  
        3272 ns         3273 ns       213769 bytes_per_second=4.60763Gi/s 
items_per_second=2.4737G/s
   
ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1024
       3272 ns         3272 ns       213713 bytes_per_second=4.60907Gi/s 
items_per_second=2.47447G/s
   
ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1  
        3270 ns         3273 ns       213934 bytes_per_second=4.60708Gi/s 
items_per_second=2.47341G/s
   
ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1  
        3314 ns         3321 ns       210344 bytes_per_second=4.5405Gi/s 
items_per_second=2.43766G/s
   
ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1  
        3273 ns         3279 ns       213189 bytes_per_second=4.5986Gi/s 
items_per_second=2.46885G/s
   
ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7  
        3310 ns         3318 ns       211121 bytes_per_second=4.54534Gi/s 
items_per_second=2.44026G/s
   ```
   
   In the senerio of Rle Read levels, performance grows faster, but in BitPack, 
it even grows slower. I guess it could benifit performance when number of input 
is small


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to