clintropolis opened a new pull request #11004:
URL: https://github.com/apache/druid/pull/11004


   ### Description
   This PR specializes `LongDeserializer` implementations with `getDelta` and 
`getTable` methods to push down unpacking bits, delta encoding adjustment, and 
table lookups for vectors of data as far as possible and work more efficiently 
with vectorized query engines, primarily focusing on contiguous reads.
   
   It works by unrolling value reads to line up with `ByteBuffer` get methods 
to eliminate overlapping reads where possible, reading blocks of 8 values at a 
time for un-aligned values (1, 2, 4, 12, 20, 24, 40, 48, 56). `ByteBuffer` 
aligned bit-packing widths (8, 16, 32, 64) actually performed worse when this 
same unrolling was performed, so instead they utilize a traditional for loop 
with the aligned get methods.
   
   ### Full column scans before/after on uniform distribution columns of 
varying bits to cover the entire set of value decoders
   before:
   ```
   Benchmark                                                                    
       (distribution)  (encoding)  (filteredRowCountPercentage)   (rows)  
(zeroProbability)  Mode  Cnt         Score      Error  Units
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
            uniform-1    lz4-auto                           1.0  5000000        
        0.0  avgt    5     23862.994 ± 1944.514  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
            uniform-2    lz4-auto                           1.0  5000000        
        0.0  avgt    5     23567.778 ±  757.632  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
            uniform-3    lz4-auto                           1.0  5000000        
        0.0  avgt    5     29546.975 ± 1916.436  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
            uniform-4    lz4-auto                           1.0  5000000        
        0.0  avgt    5     28826.685 ± 3260.379  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
            uniform-8    lz4-auto                           1.0  5000000        
        0.0  avgt    5     19829.688 ±  923.047  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
           uniform-12    lz4-auto                           1.0  5000000        
        0.0  avgt    5     32419.941 ± 1486.951  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
           uniform-16    lz4-auto                           1.0  5000000        
        0.0  avgt    5     22460.206 ± 4207.942  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
           uniform-20    lz4-auto                           1.0  5000000        
        0.0  avgt    5     31415.040 ± 3867.699  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
           uniform-24    lz4-auto                           1.0  5000000        
        0.0  avgt    5     24444.519 ± 2472.537  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
           uinform-32    lz4-auto                           1.0  5000000        
        0.0  avgt    5     18031.908 ± 1471.220  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
           uniform-40    lz4-auto                           1.0  5000000        
        0.0  avgt    5     22442.611 ± 1778.178  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
           uniform-48    lz4-auto                           1.0  5000000        
        0.0  avgt    5     24733.015 ± 3076.896  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
           uniform-56    lz4-auto                           1.0  5000000        
        0.0  avgt    5     23201.264 ± 1578.009  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
           uniform-64    lz4-auto                           1.0  5000000        
        0.0  avgt    5     20581.399 ± 1269.190  us/op
   ```
   
   after:
   ```
   Benchmark                                                                    
       (distribution)  (encoding)  (filteredRowCountPercentage)   (rows)  
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized              
         uniform-1    lz4-auto                           1.0  5000000           
     0.0  avgt    5     17892.362 ±  102.776  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
            uniform-2    lz4-auto                           1.0  5000000        
        0.0  avgt    5     17796.103 ±  417.847  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
            uniform-3    lz4-auto                           1.0  5000000        
        0.0  avgt    5     17888.816 ±  631.676  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
            uniform-4    lz4-auto                           1.0  5000000        
        0.0  avgt    5     17879.496 ±  237.066  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
            uniform-8    lz4-auto                           1.0  5000000        
        0.0  avgt    5     17508.260 ±  560.856  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
           uniform-12    lz4-auto                           1.0  5000000        
        0.0  avgt    5     18272.440 ±   71.751  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
           uniform-16    lz4-auto                           1.0  5000000        
        0.0  avgt    5     19042.292 ±  595.685  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
           uniform-20    lz4-auto                           1.0  5000000        
        0.0  avgt    5     18782.746 ±  248.738  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
           uniform-24    lz4-auto                           1.0  5000000        
        0.0  avgt    5     19048.354 ±  160.025  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
           uinform-32    lz4-auto                           1.0  5000000        
        0.0  avgt    5     17984.778 ±  633.691  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
           uniform-40    lz4-auto                           1.0  5000000        
        0.0  avgt    5     22070.007 ±  166.035  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
           uniform-48    lz4-auto                           1.0  5000000        
        0.0  avgt    5     22052.517 ± 2168.763  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
           uniform-56    lz4-auto                           1.0  5000000        
        0.0  avgt    5     25001.739 ±  259.184  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
           uniform-64    lz4-auto                           1.0  5000000        
        0.0  avgt    5     20325.369 ±   60.724  us/op
   ```
   
   ### Full column scans on other value distribution before/after comparison
   before:
   ```
   Benchmark                                                                    
       (distribution)  (encoding)  (filteredRowCountPercentage)   (rows)  
(zeroProbability)  Mode  Cnt         Score      Error  Units
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
       enumerated-0-1    lz4-auto                           1.0  5000000        
        0.0  avgt    5     24186.788 ± 1440.692  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
      enumerated-full    lz4-auto                           1.0  5000000        
        0.0  avgt    5     27220.865 ± 2221.740  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
          normal-1-32    lz4-auto                           1.0  5000000        
        0.0  avgt    5     19490.250 ± 1231.429  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
       normal-40-1000    lz4-auto                           1.0  5000000        
        0.0  avgt    5     20701.792 ± 1898.845  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
      sequential-1000    lz4-auto                           1.0  5000000        
        0.0  avgt    5     28977.482 ±  878.240  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
    sequential-unique    lz4-auto                           1.0  5000000        
        0.0  avgt    5     22856.419 ±  247.786  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
         zipf-low-100    lz4-auto                           1.0  5000000        
        0.0  avgt    5     19135.244 ±  437.260  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
      zipf-low-100000    lz4-auto                           1.0  5000000        
        0.0  avgt    5     31696.149 ± 2933.263  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
      zipf-low-32-bit    lz4-auto                           1.0  5000000        
        0.0  avgt    5     25815.759 ± 2073.480  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
        zipf-high-100    lz4-auto                           1.0  5000000        
        0.0  avgt    5     20582.544 ± 1726.414  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
     zipf-high-100000    lz4-auto                           1.0  5000000        
        0.0  avgt    5     18996.105 ± 1412.307  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
     zipf-high-32-bit    lz4-auto                           1.0  5000000        
        0.0  avgt    5     18628.002 ±  450.125  us/op
   ```
   
   after:
   ```
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
       enumerated-0-1    lz4-auto                           1.0  5000000        
        0.0  avgt    5     19250.882 ± 1221.067  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
      enumerated-full    lz4-auto                           1.0  5000000        
        0.0  avgt    5     19853.160 ±  780.701  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
          normal-1-32    lz4-auto                           1.0  5000000        
        0.0  avgt    5     18414.646 ± 1705.926  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
       normal-40-1000    lz4-auto                           1.0  5000000        
        0.0  avgt    5     20054.183 ± 2539.912  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
      sequential-1000    lz4-auto                           1.0  5000000        
        0.0  avgt    5     18585.123 ± 1524.581  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
    sequential-unique    lz4-auto                           1.0  5000000        
        0.0  avgt    5     19304.239 ±  808.152  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
         zipf-low-100    lz4-auto                           1.0  5000000        
        0.0  avgt    5     19248.088 ±  470.994  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
      zipf-low-100000    lz4-auto                           1.0  5000000        
        0.0  avgt    5     23815.318 ± 4413.581  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
      zipf-low-32-bit    lz4-auto                           1.0  5000000        
        0.0  avgt    5     26111.974 ± 4594.582  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
        zipf-high-100    lz4-auto                           1.0  5000000        
        0.0  avgt    5     20144.349 ±  489.978  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
     zipf-high-100000    lz4-auto                           1.0  5000000        
        0.0  avgt    5     18956.415 ±  796.136  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized           
     zipf-high-32-bit    lz4-auto                           1.0  5000000        
        0.0  avgt    5     18666.583 ±  385.407  us/op
   ```
   
   <hr>
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] been tested in a test Druid cluster.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to