leerho commented on issue #9806: URL: https://github.com/apache/druid/issues/9806#issuecomment-625989809
@gianm Thanks for the tutorial and insights. I have been trying to grok the Dump Segment tool and it is challenge, as it has scores of dependencies on druid internals that I don't understand and really don't want to mess with. The query/aggregation/datasketches code on the other hand has several advantages. Each sketch is already separated into its own aggregator and the aggregator code, by definition, has intimate knowledge of a particular sketch. Adding a `public abstract String toString()` method to the ComplexMetricsSerde (CMS) makes a whole lot of sense. Then, as you say, the Dump Segment tool could rely on that. It turns out that all of our sketches already have "toString(...)" methods that return human-readable summaries that we use for debugging. Based on the sketch some of these toString() methods require additional parameters for different output options, some do not. But the CMS impl would know how to call these diagnostic methods for each sketch. The output would appear like the "HllSketch 6" sample above. It would not be in a neat matrix form like I was envisioning, but, hey, it is more than we have now, and involves only minor changes to the CMS impls, and a minor change to the Dump Segment tool and that's it! It would be useful for the dump tool to spit out a row number before it calls the CMS toString() method. If someone on the Druid team could make the change in CMS and in the dump tool, than either Alex or I could add the toString() methods to the CMS impls in query/aggregation/datasketches. This, of course, ripples through all of your CMS implementations and not just DataSketches, so someone would have to work on that; but the default could be to print "no debug info available" for complex metrics that don't have diagnostic methods like we do. Lee. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
