Re: [DISCUSS]: Integrate column stats index with all query engines

Pratyaksh Sharma Wed, 10 Aug 2022 22:59:56 -0700

Surely we can work together once we get some feedback on the RFC Meng!

On Thu, Aug 11, 2022 at 9:32 AM 1037817390 <mengtao0...@qq.com.invalid>
wrote:


> +1 for this
> it will be better to provide some filter converters to faciliate the
> integration of the engine:
> eg: converter presto domain to hudi domain
>
>
>
> and i have already finish the first version of dataskipping/partition
> prune/filter pushdown for presto,
>
> https://github.com/xiarixiaoyao/presto/commit/800646608d4b88799de0addcddd97d03592954ce
>
> maybe we can work together&nbsp;
>
>
>
>
>
>
>
> 孟涛
> mengtao0...@qq.com
>
>
>
> &nbsp;
>
>
>
>
> ------------------&nbsp;原始邮件&nbsp;------------------
> 发件人:
>                                                   "dev"
>                                                                 <
> vin...@apache.org&gt;;
> 发送时间:&nbsp;2022年8月11日(星期四) 中午12:11
> 收件人:&nbsp;"dev"<dev@hudi.apache.org&gt;;
>
> 主题:&nbsp;Re: [DISCUSS]: Integrate column stats index with all query engines
>
>
>
> +1 for this.
>
> Suggested new reviewers on the RFC.
> https://github.com/apache/hudi/pull/6345/files#r943073339
>
> On Wed, Aug 10, 2022 at 9:56 PM Pratyaksh Sharma <pratyaks...@gmail.com
> &gt;
> wrote:
>
> &gt; Hello community,
> &gt;
> &gt; With the introduction of multi modal index in Hudi, there is a lot of
> scope
> &gt; for improvement on the querying side. There are 2 major ways of
> reducing
> &gt; the data scan at the time of querying - partition pruning and file
> pruning.
> &gt; While with the latest developments in the community, partition
> pruning is
> &gt; supported for commonly used query engines like spark, presto and
> hive, File
> &gt; pruning using column stats index is only supported for spark and
> flink.
> &gt;
> &gt; We intend to support data skipping for the rest of the engines as well
> &gt; which include hive, presto and trino. I have written a draft RFC here
> -
> &gt; https://github.com/apache/hudi/pull/6345.
> &gt;
> &gt; Please take a look and let me know what you think. Once we have some
> &gt; feedback from the community, we can decide on the next steps.
> &gt;

Re: [DISCUSS]: Integrate column stats index with all query engines

Reply via email to