praveenc7 commented on PR #17411:
URL: https://github.com/apache/pinot/pull/17411#issuecomment-3717501228

   > overall lgtm!
   > 
   > One question, how was the default of 100_000 chosen as the threshold? Is 
that a desirable default for all queries going forward? Are there any scenarios 
where using the Integer.MAX_VALUE may be preferred? It'll be good to call this 
out in documentation as a guide for users
   > 
   > Also, can you take care of updating OSS documentation to reflect these 
changes? Thanks!
   
   Thanks for the review @somandal 
   
   On the 100,000 threshold?
   I chose 100K based on a combination of (a) micro-benchmarking across 
multiple recordCount sizes and cardinality ratios, and (b) validating behavior 
on the perf cluster with representative query patterns. The crossover point 
varies a bit by record count/cardinality ratio, but ~100K is a consistent “safe 
default” where the direct HLL path stops regressing and starts outperforming 
the RoaringBitmap-heavy path for  workloads. (I included representative 
benchmark tables in the Testing section)
   
   When would Integer.MAX_VALUE be preferred? 
   With the adaptive threshold in place, I don’t see a strong need to use 
Integer.MAX_VALUE in normal scenarios. The intent of the change is precisely to 
handle both low- and high-cardinality cases automatically. Added it as a 
guardrail if something odd shows up in production


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to