gianm opened a new issue #6814: [Discuss] Replacing hyperUnique as 'default' 
distinct count sketch
URL: https://github.com/apache/incubator-druid/issues/6814
 
 
   Branching off a discussion from #6743. See @leerho's comment 
https://github.com/apache/incubator-druid/issues/6743#issuecomment-449490568 
for a rationale as to why it could make sense to transition to HllSketch. 
However, it would be a delicate transition, requiring solutions to the 
following problems.
   
   - The on-disk format is not compatible, and cannot be, due to the difference 
in hash functions used. So the migration would be an extended one, and we 
should expect that some users might never migrate, due to an inability to 
reindex data. (Not all users retain copies of their raw data, despite the fact 
that it is a best practice.)
   - The fact that the new one is in an extension and the old one is in core 
presents the opportunity for user confusion. Ideally they'd both be in core or 
both be in extensions.
   - Druid SQL's `COUNT(DISTINCT x)` operator uses hyperUnique currently. 
Ideally, it would adapt to use whatever sketch aggregator matches your 
segments, when run on a complex column. And if you run it on strings, it should 
use the 'best' available.
   
   Alternative approaches?
   
   - Patching hyperUnique's implementation to improve its error 
characteristics. I'm not sure if this is possible while retaining the same 
on-disk format. If not, it would need to involve an ability to read both the 
current format and a new format.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to