[GitHub] [pinot] kishoreg edited a comment on pull request #8074: For DISTINCT_COUNT, automatically convert Set to HyperLogLog when cardinality is too high

GitBox Wed, 26 Jan 2022 15:16:23 -0800


kishoreg edited a comment on pull request #8074:
URL: https://github.com/apache/pinot/pull/8074#issuecomment-1022688453



   can we list out all the options we have
   1. Automatically convert set to hyperloglog after a threshold 
      a. Threshold is set to something 100K by default
      b. threshold is set to -1 which means feature is off and folks can change 
it
      c.  user has the ability to control the threshold through query option 
(enable_approx_distinct_threshold=100,000)
   
   2. Return error if the threshold is reached
      a. user then uses disctintcounthll 
      
   The reason why I don't prefer second option where we return error and ask 
users to use distinctcounthll
   - the users cannot change to distinctcountsql because this will always 
return approximate even when it does not hit the threshold. 
   - most of them will not hit this error in testing and will directly see this 
in production which is too late.
   - Pinot is mostly accessed programmatically via apps and the app user cannot 
really do much when the app returns error.
   - distinctcounthll is not really a standard sql and wont work with other 
standard tools like tableau, superset etc
   
   My preference is to go with option 1 but start with -1 as the default value 
which makes the feature off by default but have the ability to override it 
using per query option or server config. This might still mean that they will 
see OOM when the distinct set does not fit in memory but they have the option 
to fix it via server config (no need of code change) or queryoption (needs 
change in app code)
   
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] kishoreg edited a comment on pull request #8074: For DISTINCT_COUNT, automatically convert Set to HyperLogLog when cardinality is too high

Reply via email to