gianm commented on issue #7607: thetaSketch(with sketches-core-0.13.1) in 
groupBy always return value no more than 16384
URL: 
https://github.com/apache/incubator-druid/issues/7607#issuecomment-490573859
 
 
   @leerho, please let me know if the following is helpful, or if I could do 
anything else to help.
   
   What the Druid query is doing is something like this:
   
   1. Iterating over all rows in a Druid segment, and building up a theta 
sketch object. This object looks fine.
   2. Taking that object and merging it into the 'merge buffer', which starts 
off initialized to an empty sketch. This is where it goes off the rails.
   
   I scattered a bunch of sketch toStrings around the code and found that in 
step (2) they look like this:
   
   **The object built up from the segment scan,**
   
   ```
   ### HeapCompactOrderedSketch SUMMARY:
      Estimate                : 100086.81001356241
      Upper Bound, 95% conf   : 101530.78009013624
      Lower Bound, 95% conf   : 98663.31883662633
      Theta (double)          : 0.16369789383615946
      Theta (long)            : 1509846576500454824
      Theta (long) hex        : 14f40d7639a635a8
      EstMode?                : true
      Empty?                  : false
      Array Size Entries      : 16384
      Retained Entries        : 16384
      Seed Hash               : 93cc | 37836
   ### END SKETCH SUMMARY
   ```
   
   **The initial state of the sketch in the merge buffer (should be empty),**
   
   ```
   ### HeapCompactOrderedSketch SUMMARY:
      Estimate                : 0.0
      Upper Bound, 95% conf   : 0.0
      Lower Bound, 95% conf   : 0.0
      Theta (double)          : 1.0
      Theta (long)            : 9223372036854775807
      Theta (long) hex        : 7fffffffffffffff
      EstMode?                : false
      Empty?                  : true
      Array Size Entries      : 0
      Retained Entries        : 0
      Seed Hash               : 93cc | 37836
   ### END SKETCH SUMMARY
   ```
   
   **The final state of the sketch in the merge buffer (should match the 
original sketch from the segment scan),**
   
   ```
   ### HeapCompactOrderedSketch SUMMARY:
      Estimate                : 16384.0
      Upper Bound, 95% conf   : 16384.0
      Lower Bound, 95% conf   : 16384.0
      Theta (double)          : 1.0
      Theta (long)            : 9223372036854775807
      Theta (long) hex        : 7fffffffffffffff
      EstMode?                : false
      Empty?                  : false
      Array Size Entries      : 16384
      Retained Entries        : 16384
      Seed Hash               : 93cc | 37836
   ### END SKETCH SUMMARY
   ```
   
   It's changed a bit, but doesn't match up.
   
   The code that printed this was the `aggregate` method in 
SketchBufferAggregator, which looks like this after the debugging code I added:
   
   ```java
     @Override
     public void aggregate(ByteBuffer buf, int position)
     {
       Object update = selector.getObject();
       if (update == null) {
         return;
       }
   
       Union union = getOrCreateUnion(buf, position);
       final String initialUnionResult = update instanceof SketchHolder ? 
union.getResult().toString() : null;
   
       SketchAggregator.updateUnion(union, update);
   
       if (update instanceof SketchHolder) {
         log.info(
             "Aggregate called with buffer[%s], position[%s], update = %s, 
union starts as = %s, union ends as = %s",
             System.identityHashCode(buf),
             position,
             ((SketchHolder) update).getSketch(),
             initialUnionResult,
             union.getResult()
         );
       }
     }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to