Note: 

Due to differences in testing resources and configurations, performance data is 
only for reference and lacks comparability

On 2024/09/30 07:21:59 Chang Chen wrote:
> Two years ago, we initiated the Gluten project in collaboration with Intel,
> aiming to enhance data processing capabilities and optimize performance for
> big data applications. One year ago, we decide to extend our support for
> the MergeTree storage engine on cloud platforms for improving scan
> performance against parquet. This report marks the first comprehensive
> update on the current status of gluten-clickhouse-backend.
> 
> We have successfully added support for Spark versions 3.2, 3.3, and 3.5;
> however, it is important to note that we do not plan to implement support
> for Spark version 3.4 due to resource considerations.
> 
> Moving forward, I will endeavor to provide weekly updates regarding our
> progress and developments within this project. Over the past month, several
> key pull requests (PRs) stand out:
> 
>    -
> 
>    Support spark 3.2, 3.3 and 3.5, we don't plan to support spark 3.4, with
>    -
> 
>    Now, we fully support tpcds and tpch
>    -
> 
>       https://github.com/apache/incubator-gluten/pull/7072 and
>       https://github.com/apache/incubator-gluten/issues/7180
>       -
> 
>       https://github.com/apache/incubator-gluten/pull/7176
>       -
> 
>    Improve decimal performance :
>    https://github.com/apache/incubator-gluten/pull/7196
>    -
> 
>    MergeTree meta: https://github.com/apache/incubator-gluten/pull/7239
> 
> To illustrate our current benchmarking results using TPCDS with a scale
> factor of 100 across one master node and three worker nodes equipped with a
> total of 48 cores:
> Spark version 3.5.1time(s)
> Vanilla Spark [Local Parquet] 589.5
> Gluten[master local Parquet]* 321.9
> Gluten[master mergetree on s3] 279.7
> 
> Node:
> 
>    1.
> 
>    It is noteworthy that since we cache MergeTree structures locally on
>    disk during testing phases, parquet files were placed directly onto local
>    worker nodes to ensure fairness by minimizing potential IO latency issues
>    that could skew benchmark results.
>    2.
> 
>    The current benchmark includes fixes addressing previously identified
>    issues detailed here:
>    https://github.com/apache/incubator-gluten/issues/7394.
> 
> We would like to express gratitude towards all contributors who have played
> an integral role in advancing this project’s objectives—particularly lgbo (
> [email protected]), 李杨 ([email protected]), shuai ([email protected]),
> Zhizhao Zhao ([email protected]), Liu Neng ([email protected]). Their
> collaborative efforts continue driving innovation within the Gluten
> framework while enhancing its overall functionality and user experience.
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to