gianm commented on issue #6814: [Discuss] Replacing hyperUnique as 'default' distinct count sketch URL: https://github.com/apache/incubator-druid/issues/6814#issuecomment-454881037 > As such I do not think removing Druid-HLL is a valid path forward. 100% agree. We do not ever want to break compatibility with existing on-disk segments, so we need to keep it. > I think the module system is pretty solid in Druid (despite a few hadoop related issues), so I would prefer if HLL were moved to a core module instead of the other way around. Druid core does need some kind of HLL implementation, so they can't all be moved to modules. It's used by DetermineHashedPartitionsJob and IndexTask to determine partitions. We might end up wanting to use it for other things too, like tracking dimension cardinality statistics on the broker to aid in query planning. Another consideration is that HLL-based counting functionality is a pretty standard thing in databases these days. Offering it as a core thing is nearly universal. A final consideration is the interaction with Druid SQL. In Druid SQL, by default, `COUNT(DISTINCT x)` uses HLL. You can set `useApproximateCountDistinct = false` but by default it's true. There have been objections raised to this (https://github.com/apache/incubator-druid/issues/6814#issuecomment-452056269), but I think it's a good choice for Druid, since it adheres to the Druid philosophy that fast is best, and approximations are how you get there. I feel that doing it in exact mode by default would be un-Druidy. It affects the decision of where to put HLL, since the strategy of count-distinct being approximate by default is only feasible if there's an HLL implementation in core Druid.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
