pdeva opened a new issue #7337: DataSketches HLL is not a replacement for 
Cardinality aggregator
URL: https://github.com/apache/incubator-druid/issues/7337
 
 
   ### Description
   The `Cardinality` aggregator is deprecated in 0.14 in favor of `Datasketch 
HLL`.
   
   However, `Datasketch HLL` requires you to add that aggregation during 
ingestion time, thus severely limiting its usage.
   
   What if after you have been ingesting the data, you decide you want to 
calculate the cardinality of some columns? You can do that with `Cardinality` 
aggregator.
    
   ### Motivation
   
   `Datasketch HLL` requires you to specify columns to calculate cardinality 
over during ingestion time. This limits its usage if you decide to calculate 
cardinality over some different column later in your application.
   
   Since the `Cardinality` aggregator has no such limitation, it should _not_ 
be deprecated and kept alongside `Datasketch HLL`. Those who want ultra fast 
cardinality queries  over specific columns can pre-specify them during 
ingestion. For others, the Cardinality aggregator would provide a good fallback.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to