clintropolis commented on issue #6066: Sorting rows when rollup is disabled
URL: 
https://github.com/apache/incubator-druid/issues/6066#issuecomment-409771369
 
 
   I did some digging and the more straightforward solution to this problem is 
to just store the rows sorted in `PlainFactsHolder` similar to how 
`RollupFactsHolder` is currently doing, changing from `ConcurrentMap<Long, 
Deque<IncrementalIndexRow>>` to `ConcurrentMap<IncrementalIndexRow, 
Deque<IncrementalIndexRow>>` and using the `IncrementalIndexRowComparator`. I 
think the alternative is finding all calls to `FactsHolder.iterator` and 
`FactsHolder.keySet` used for persist and replacing those with a version that 
sorts `IncrementalIndexRow` with the comparator (unless I'm missing another 
simpler place where this could be done). 
   
   I did some benchmarking and there does appear to be some cost to doing the 
sorting up front, I'm not sure on the range of what is an acceptable amount of 
overhead.
   
   ```
   Benchmark                                          (rollupOption)  Mode  Cnt 
  Score   Error  Units
   IncrementalIndexRowTypeBenchmark.normalFloats              rollup  avgt   40 
 19.965 ± 1.780  us/op
   IncrementalIndexRowTypeBenchmark.normalFloats           no-rollup  avgt   40 
 10.596 ± 1.055  us/op
   IncrementalIndexRowTypeBenchmark.normalFloats   ordered-no-rollup  avgt   40 
 21.925 ± 1.323  us/op
   IncrementalIndexRowTypeBenchmark.normalLongs               rollup  avgt   40 
 19.108 ± 1.286  us/op
   IncrementalIndexRowTypeBenchmark.normalLongs            no-rollup  avgt   40 
 10.107 ± 1.042  us/op
   IncrementalIndexRowTypeBenchmark.normalLongs    ordered-no-rollup  avgt   40 
 21.967 ± 1.406  us/op
   IncrementalIndexRowTypeBenchmark.normalStrings             rollup  avgt   40 
 20.489 ± 2.442  us/op
   IncrementalIndexRowTypeBenchmark.normalStrings          no-rollup  avgt   40 
  9.352 ± 0.105  us/op
   IncrementalIndexRowTypeBenchmark.normalStrings  ordered-no-rollup  avgt   40 
 20.986 ± 0.508  us/op
   ```
   
   ```
   Benchmark                                  (numSegments)     (rollupSchema)  
(rowsPerSegment)        (schemaAndQuery)  (threshold)  Mode  Cnt        Score   
    Error  Units
   TopNBenchmark.querySingleIncrementalIndex              1          no-rollup  
          750000                 basic.A           10  avgt   25   950647.742 ± 
20969.530  us/op
   TopNBenchmark.querySingleIncrementalIndex              1          no-rollup  
          750000       basic.numericSort           10  avgt   25   230487.526 ± 
29340.615  us/op
   TopNBenchmark.querySingleIncrementalIndex              1          no-rollup  
          750000  basic.alphanumericSort           10  avgt   25   218782.138 ± 
 6203.484  us/op
   TopNBenchmark.querySingleIncrementalIndex              1  ordered-no-rollup  
          750000                 basic.A           10  avgt   25   945842.924 ± 
12074.015  us/op
   TopNBenchmark.querySingleIncrementalIndex              1  ordered-no-rollup  
          750000       basic.numericSort           10  avgt   25   222019.610 ± 
 3486.365  us/op
   TopNBenchmark.querySingleIncrementalIndex              1  ordered-no-rollup  
          750000  basic.alphanumericSort           10  avgt   25   223015.114 ± 
 3130.184  us/op
   TopNBenchmark.querySingleIncrementalIndex              1             rollup  
          750000                 basic.A           10  avgt   25  1347085.823 ± 
12655.001  us/op
   TopNBenchmark.querySingleIncrementalIndex              1             rollup  
          750000       basic.numericSort           10  avgt   25   204926.129 ± 
 4846.150  us/op
   TopNBenchmark.querySingleIncrementalIndex              1             rollup  
          750000  basic.alphanumericSort           10  avgt   25   201050.213 ± 
 6559.034  us/op
   ```
   ```
   Benchmark                                        (numSegments)     
(rollupSchema)  (rowsPerSegment)              (schemaAndQuery)  Mode  Cnt       
 Score       Error  Units
   TimeseriesBenchmark.querySingleIncrementalIndex              1          
no-rollup            750000                       basic.A  avgt   25   
921919.453 ± 25357.840  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1          
no-rollup            750000       basic.timeFilterNumeric  avgt   25    
69240.969 ±  1403.393  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1          
no-rollup            750000  basic.timeFilterAlphanumeric  avgt   25   
152974.422 ±  2181.950  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1          
no-rollup            750000    basic.timeFilterByInterval  avgt   25    
16752.936 ±   406.768  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1  
ordered-no-rollup            750000                       basic.A  avgt   25   
906129.041 ± 19497.575  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1  
ordered-no-rollup            750000       basic.timeFilterNumeric  avgt   25    
66989.537 ±  1249.002  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1  
ordered-no-rollup            750000  basic.timeFilterAlphanumeric  avgt   25   
153816.935 ±  2080.406  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1  
ordered-no-rollup            750000    basic.timeFilterByInterval  avgt   25    
16650.825 ±   271.827  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1             
rollup            750000                       basic.A  avgt   25  1410127.820 
± 19685.994  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1             
rollup            750000       basic.timeFilterNumeric  avgt   25    48694.028 
±   865.701  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1             
rollup            750000  basic.timeFilterAlphanumeric  avgt   25   138381.904 
±  1284.566  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1             
rollup            750000    basic.timeFilterByInterval  avgt   25    14172.496 
±  1020.567  us/op
   ```
   ```
   Benchmark                                     (defaultStrategy)  
(initialBuckets)  (numProcessingThreads)  (numSegments)  (queryGranularity)     
(rollupSchema)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt      Score      
Error  Units
   GroupByBenchmark.querySingleIncrementalIndex                 v2              
  -1                       4              4                 all          
no-rollup            100000           basic.A  avgt   25  53761.212 ± 2854.750  
us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2              
  -1                       4              4                 all          
no-rollup            100000      basic.nested  avgt   25  71957.267 ± 3881.153  
us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2              
  -1                       4              4                 all  
ordered-no-rollup            100000           basic.A  avgt   25  63418.312 ± 
9735.523  us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2              
  -1                       4              4                 all  
ordered-no-rollup            100000      basic.nested  avgt   25  79107.369 ± 
3597.208  us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2              
  -1                       4              4                 all             
rollup            100000           basic.A  avgt   25  57728.209 ± 3683.978  
us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2              
  -1                       4              4                 all             
rollup            100000      basic.nested  avgt   25  77225.014 ± 4820.121  
us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2              
  -1                       4              4                 day          
no-rollup            100000           basic.A  avgt   25  60686.368 ± 3545.676  
us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2              
  -1                       4              4                 day          
no-rollup            100000      basic.nested  avgt   25  73173.081 ± 3365.438  
us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2              
  -1                       4              4                 day  
ordered-no-rollup            100000           basic.A  avgt   25  67065.055 ± 
2742.212  us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2              
  -1                       4              4                 day  
ordered-no-rollup            100000      basic.nested  avgt   25  78658.129 ± 
4969.871  us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2              
  -1                       4              4                 day             
rollup            100000           basic.A  avgt   25  62835.895 ± 3471.298  
us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2              
  -1                       4              4                 day             
rollup            100000      basic.nested  avgt   25  78779.644 ± 5321.856  
us/op
   ```
   The difference is most apparent on adding rows where performance is similar 
to performance of rollup enabled, and in group by queries, where performance is 
slightly slower than if rollup were enabled. I would also expect some slight 
increased memory usage with this approach due to increased number of shorter 
length `Deque` objects from a larger number of key entries in the facts holder 
map.
   
   I also haven't done any measurement on size differences of persisted 
segments yet.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to