+1 for this it will be better to provide some filter converters to faciliate the integration of the engine: eg: converter presto domain to hudi domain
and i have already finish the first version of dataskipping/partition prune/filter pushdown for presto, https://github.com/xiarixiaoyao/presto/commit/800646608d4b88799de0addcddd97d03592954ce maybe we can work together ???? mengtao0...@qq.com ------------------ ???????? ------------------ ??????: "dev" <vin...@apache.org>; ????????: 2022??8??11??(??????) ????12:11 ??????: "dev"<dev@hudi.apache.org>; ????: Re: [DISCUSS]: Integrate column stats index with all query engines +1 for this. Suggested new reviewers on the RFC. https://github.com/apache/hudi/pull/6345/files#r943073339 On Wed, Aug 10, 2022 at 9:56 PM Pratyaksh Sharma <pratyaks...@gmail.com> wrote: > Hello community, > > With the introduction of multi modal index in Hudi, there is a lot of scope > for improvement on the querying side. There are 2 major ways of reducing > the data scan at the time of querying - partition pruning and file pruning. > While with the latest developments in the community, partition pruning is > supported for commonly used query engines like spark, presto and hive, File > pruning using column stats index is only supported for spark and flink. > > We intend to support data skipping for the rest of the engines as well > which include hive, presto and trino. I have written a draft RFC here - > https://github.com/apache/hudi/pull/6345. > > Please take a look and let me know what you think. Once we have some > feedback from the community, we can decide on the next steps. >