gianm commented on issue #7607: thetaSketch(with sketches-core-0.13.1) in groupBy always return value no more than 16384 URL: https://github.com/apache/incubator-druid/issues/7607#issuecomment-490573859 @leerho, please let me know if the following is helpful, or if I could do anything else to help. What the Druid query is doing is something like this: 1. Iterating over all rows in a Druid segment, and building up a theta sketch object. This object looks fine. 2. Taking that object and merging it into the 'merge buffer', which starts off initialized to an empty sketch. This is where it goes off the rails. I scattered a bunch of sketch toStrings around the code and found that in step (2) they look like this: **The object built up from the segment scan,** ``` ### HeapCompactOrderedSketch SUMMARY: Estimate : 100086.81001356241 Upper Bound, 95% conf : 101530.78009013624 Lower Bound, 95% conf : 98663.31883662633 Theta (double) : 0.16369789383615946 Theta (long) : 1509846576500454824 Theta (long) hex : 14f40d7639a635a8 EstMode? : true Empty? : false Array Size Entries : 16384 Retained Entries : 16384 Seed Hash : 93cc | 37836 ### END SKETCH SUMMARY ``` **The initial state of the sketch in the merge buffer (should be empty),** ``` ### HeapCompactOrderedSketch SUMMARY: Estimate : 0.0 Upper Bound, 95% conf : 0.0 Lower Bound, 95% conf : 0.0 Theta (double) : 1.0 Theta (long) : 9223372036854775807 Theta (long) hex : 7fffffffffffffff EstMode? : false Empty? : true Array Size Entries : 0 Retained Entries : 0 Seed Hash : 93cc | 37836 ### END SKETCH SUMMARY ``` **The final state of the sketch in the merge buffer (should match the original sketch from the segment scan),** ``` ### HeapCompactOrderedSketch SUMMARY: Estimate : 16384.0 Upper Bound, 95% conf : 16384.0 Lower Bound, 95% conf : 16384.0 Theta (double) : 1.0 Theta (long) : 9223372036854775807 Theta (long) hex : 7fffffffffffffff EstMode? : false Empty? : false Array Size Entries : 16384 Retained Entries : 16384 Seed Hash : 93cc | 37836 ### END SKETCH SUMMARY ``` It's changed a bit, but doesn't match up. The code that printed this was the `aggregate` method in SketchBufferAggregator, which looks like this after the debugging code I added: ```java @Override public void aggregate(ByteBuffer buf, int position) { Object update = selector.getObject(); if (update == null) { return; } Union union = getOrCreateUnion(buf, position); final String initialUnionResult = update instanceof SketchHolder ? union.getResult().toString() : null; SketchAggregator.updateUnion(union, update); if (update instanceof SketchHolder) { log.info( "Aggregate called with buffer[%s], position[%s], update = %s, union starts as = %s, union ends as = %s", System.identityHashCode(buf), position, ((SketchHolder) update).getSketch(), initialUnionResult, union.getResult() ); } } ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
