leerho commented on issue #9806:
URL: https://github.com/apache/druid/issues/9806#issuecomment-625989809


   @gianm Thanks for the tutorial and insights.  I have been trying to grok the 
Dump Segment tool and it is challenge, as it has scores of dependencies on 
druid internals that I don't understand and really don't want to mess with.
   
   The query/aggregation/datasketches code on the other hand has several 
advantages.  Each sketch is already separated into its own aggregator and the 
aggregator code, by definition, has intimate knowledge of a particular sketch.  
Adding a `public abstract String toString()` method to the ComplexMetricsSerde 
(CMS) makes a whole lot of sense.  Then, as you say, the Dump Segment tool 
could rely on that.  
   
   It turns out that all of our sketches already have "toString(...)" methods 
that return human-readable summaries that we use for debugging.  Based on the 
sketch some of these toString() methods require additional parameters for 
different output options, some do not.  But the CMS impl would know how to call 
these diagnostic methods for each sketch.  The output would appear like the 
"HllSketch 6" sample above.  It would not be in a neat matrix form like I was 
envisioning, but, hey, it is more than we have now, and involves only minor 
changes to the CMS impls, and a minor change to the Dump Segment tool and 
that's it!  It would be useful for the dump tool to spit out a row number 
before it calls the CMS toString() method.
   
   If someone on the Druid team could make the change in CMS and in the dump 
tool, than either Alex or I could add the toString() methods to the CMS impls 
in query/aggregation/datasketches. 
   
   This, of course, ripples through all of your CMS implementations and not 
just DataSketches, so someone would have to work on that; but the default could 
be to print "no debug info available" for complex metrics that don't have 
diagnostic methods like we do.
   
   Lee.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to