tdunning commented on pull request #2432: URL: https://github.com/apache/drill/pull/2432#issuecomment-1023645917
The best general default is the t-digest default. Even when you don't think you care about tail accuracy, it can creep in. Furthermore, the loss of accuracy near the median is something like 2:1. The difference in accuracy near the tails, however, can be 1,000,000:1. So you don't lose much with the default setting and you gain a ton (in the right circumstances). On Thu, Jan 27, 2022 at 10:38 AM James Turton ***@***.***> wrote: > @tdunning <https://github.com/tdunning> thank you for clarifying. I speak > under correction but the application of the t-digest here will be to help > the query planner to estimate relation cardinalities, those being > statistics which it will then use to decide e.g. which of two relations > should be the "build side" of a hash join. Optimally, the smaller relation > goes on the build side in that example. > > To estimate a relation's cardinality after a range predicate has been > applied, the planner can each for quantiles that with luck were precomputed > over the relevant column. I can't see a reason why real-world range > predicates would be more focused on the tails of distributions, the shapes > of which we'd probably like to assume nothing about anyway. > > My take is that probably any t-digest scale factor is quite accurate > enough for a query planner, but of them all I'd agree that K_0 is probably > the best choice under the assumptions here. We can now either hard code > Drill to K_0 or default it to K_0 and add a config option so that users can > override. I'm not convinced the config option will justify its inclusion > because it is one that users would need to vary from table to table > depending on specifics of the data there. So I'm inclined to just hard code > a good general default. > > — > Reply to this email directly, view it on GitHub > <https://github.com/apache/drill/pull/2432#issuecomment-1023528517>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAB5E6VCK4NTTOXTLWM4ADLUYGGK7ANCNFSM5MM2LDFQ> > . > Triage notifications on the go with GitHub Mobile for iOS > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> > or Android > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. > > You are receiving this because you were mentioned.Message ID: > ***@***.***> > -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
