Note: Due to differences in testing resources and configurations, performance data is only for reference and lacks comparability
On 2024/09/30 07:21:59 Chang Chen wrote: > Two years ago, we initiated the Gluten project in collaboration with Intel, > aiming to enhance data processing capabilities and optimize performance for > big data applications. One year ago, we decide to extend our support for > the MergeTree storage engine on cloud platforms for improving scan > performance against parquet. This report marks the first comprehensive > update on the current status of gluten-clickhouse-backend. > > We have successfully added support for Spark versions 3.2, 3.3, and 3.5; > however, it is important to note that we do not plan to implement support > for Spark version 3.4 due to resource considerations. > > Moving forward, I will endeavor to provide weekly updates regarding our > progress and developments within this project. Over the past month, several > key pull requests (PRs) stand out: > > - > > Support spark 3.2, 3.3 and 3.5, we don't plan to support spark 3.4, with > - > > Now, we fully support tpcds and tpch > - > > https://github.com/apache/incubator-gluten/pull/7072 and > https://github.com/apache/incubator-gluten/issues/7180 > - > > https://github.com/apache/incubator-gluten/pull/7176 > - > > Improve decimal performance : > https://github.com/apache/incubator-gluten/pull/7196 > - > > MergeTree meta: https://github.com/apache/incubator-gluten/pull/7239 > > To illustrate our current benchmarking results using TPCDS with a scale > factor of 100 across one master node and three worker nodes equipped with a > total of 48 cores: > Spark version 3.5.1time(s) > Vanilla Spark [Local Parquet] 589.5 > Gluten[master local Parquet]* 321.9 > Gluten[master mergetree on s3] 279.7 > > Node: > > 1. > > It is noteworthy that since we cache MergeTree structures locally on > disk during testing phases, parquet files were placed directly onto local > worker nodes to ensure fairness by minimizing potential IO latency issues > that could skew benchmark results. > 2. > > The current benchmark includes fixes addressing previously identified > issues detailed here: > https://github.com/apache/incubator-gluten/issues/7394. > > We would like to express gratitude towards all contributors who have played > an integral role in advancing this project’s objectives—particularly lgbo ( > [email protected]), 李杨 ([email protected]), shuai ([email protected]), > Zhizhao Zhao ([email protected]), Liu Neng ([email protected]). Their > collaborative efforts continue driving innovation within the Gluten > framework while enhancing its overall functionality and user experience. > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
