gianm opened a new pull request, #12745:
URL: https://github.com/apache/druid/pull/12745

   As we move towards query execution plans that involve more transfer
   of data between servers, it's important to have a data format that
   provides for doing this more efficiently than the options available to
   us today.
   
   This patch adds:
   
   - Columnar frames, which support fast querying. Writes are faster than
     on the segment format. Querying is slower than equivalent operations
     on the segment format, due to lack of indexes and due to various choices
     intended to support fast writes as well reasonably fast reads. Benchmarks
     below.
   - Row-based frames, which support fast sorting via memory comparison
     and fast whole-row copies via memory copying.
   - Frame files, a container format that can be stored on disk or
     transferred between servers.
   
   The idea is we should use row-based frames when data is expected to
   be sorted, and columnar frames when data is expected to be queried.
   
   The code in this patch is not used in production yet. Therefore, the
   patch involves minimal changes outside of the `org.apache.druid.frame`
   package.  The main ones are adjustments to SqlBenchmark to add benchmarks
   for queries on frames, and the addition of a "forEach" method to Sequence.
   
   Future patches in the #12262 sequence will use these frames for data
   transfer and short-term storage.
   
   Benchmarks for queries on frames vs. traditional segments (mmap):
   
   ```
   Benchmark              (query)  (rowsPerSegment)   (storageType)  
(vectorize)  Mode  Cnt    Score   Error  Units
   SqlBenchmark.querySql        0           2000000            mmap        
false  avgt   15    6.296 ± 0.081  ms/op
   SqlBenchmark.querySql        0           2000000       frame-row        
false  avgt   15   88.495 ± 0.579  ms/op
   SqlBenchmark.querySql        0           2000000  frame-columnar        
false  avgt   15   13.715 ± 0.562  ms/op
   SqlBenchmark.querySql       10           2000000            mmap        
false  avgt   15  251.530 ± 4.862  ms/op
   SqlBenchmark.querySql       10           2000000       frame-row        
false  avgt   15  626.003 ± 4.862  ms/op
   SqlBenchmark.querySql       10           2000000  frame-columnar        
false  avgt   15  466.353 ± 0.603  ms/op
   SqlBenchmark.querySql       18           2000000            mmap        
false  avgt   15  172.775 ± 0.890  ms/op
   SqlBenchmark.querySql       18           2000000       frame-row        
false  avgt   15  225.835 ± 2.350  ms/op
   SqlBenchmark.querySql       18           2000000  frame-columnar        
false  avgt   15  177.613 ± 1.210  ms/op
   
   Benchmark              (query)  (rowsPerSegment)   (storageType)  
(vectorize)  Mode  Cnt    Score    Error  Units
   SqlBenchmark.querySql        0           2000000            mmap        
force  avgt   15    0.509 ±  0.013  ms/op
   SqlBenchmark.querySql        0           2000000  frame-columnar        
force  avgt   15    7.524 ±  0.123  ms/op
   SqlBenchmark.querySql       10           2000000            mmap        
force  avgt   15  174.626 ± 10.985  ms/op
   SqlBenchmark.querySql       10           2000000  frame-columnar        
force  avgt   15  455.922 ± 19.296  ms/op
   SqlBenchmark.querySql       18           2000000            mmap        
force  avgt   15   38.537 ±  1.182  ms/op
   SqlBenchmark.querySql       18           2000000  frame-columnar        
force  avgt   15   50.755 ±  0.751  ms/op
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to