kishoreg commented on pull request #8074:
URL: https://github.com/apache/pinot/pull/8074#issuecomment-1022688453
can we list out all the options we have
1. Automatically convert set to hyperloglog after a threshold
a. Threshold is set to something 100K by default
b. threshold is set to -1 which means feature is off and folks can change
it
c. user has the ability to control the threshold through query option
(enable_approx_distinct_threshold=100,000)
2. Return error if the threshold is reached
a. user then uses disctintcounthll
The reason why I don't prefer second option where we return error and ask
users to use distinctcounthll
- the users cannot change to distinctcountsql because this will always
return approximate even when it does not hit the threshold.
- most of them will not hit this error in testing and will directly see this
in production which is too late.
- Pinot is mostly accessed programmatically via apps and the app user cannot
really do much when the app returns error.
- distinctcounthll is not really a standard sql and wont work with other
standard tools like tableau, superset etc
My preference is to go with option 1 but start with -1 as the default value
which makes the feature off by default but have the ability to override it
using per query option or server config.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]