2010YOUY01 commented on PR #19609: URL: https://github.com/apache/datafusion/pull/19609#issuecomment-3727624481
> I didn't have time yet to look at this big PR, but I looked at the issue and design. As a general thought I think replacing the `evaluate_bounds`/`propagate_constraints` duo will not be easy. The `Interval` library makes very careful calculations w.r.t. things like rounding (with floats etc.), because their results are used to take branches in contexts like pruning join hash tables that may operate on data like floats. Support for such rounding-aware calculations were not present in `arrow-rs` at the time when we created these APIs (and still not here if I'm not missing something). This is a really good point. I didn't consider rounding safety so far, I'll make sure to include them in the vectorized version also. By the way, do you have any references on the high-level ideas behind “join pruning,” and why we need the inverse path (`propagate_constraints()`)? @ozankabak Just out of my curiosity. Thanks for the context. For now, I think we shouldn’t touch the existing statistics propagation APIs and should introduce a vectorized one in this work. I’ll add more documentation to explain the rationale. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
