zhztheplayer opened a new pull request, #5435: URL: https://github.com/apache/incubator-gluten/pull/5435
Velox's bloom-filter agg/filter functions are logically different with Spark's version. This makes their resident Gluten/Spark Filter/Aggregate operators logically different with Spark's version. For such logical differences, we should use different functions to distinguish between implementations rather than reusing Spark's function type in Velox backend. Patch incorporates: 1. Add `VeloxBloomFilterMightContain` / `VeloxBloomFilterAggregate`. 2. When transforming `FilterExec` to `FilterExecTransformer`, transform `BloomFilterMightContain` / `BloomFilterAggregate` to `VeloxBloomFilterMightContain` / `VeloxBloomFilterAggregate` at the same time. 3. Remove rule `FallbackBloomFilterAggIfNeeded`. The patch makes the relevant code safer than before, since we have explicit function pair of `BloomFilterMightContain` / `BloomFilterAggregate` and `VeloxBloomFilterMightContain` / `VeloxBloomFilterAggregate`. Thus we can easier detect a mismatch between agg/filter function by checking their implementation types, rather than failing the query execution at runtime. The patch is still not able to solve the failed test case in https://github.com/apache/incubator-gluten/pull/5433. However, the failure will be reported faster by Spark since vanilla Spark doesn't implement function `VeloxBloomFilterMightContain`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
