Two years ago, we initiated the Gluten project in collaboration with Intel,
aiming to enhance data processing capabilities and optimize performance for
big data applications. One year ago, we decide to extend our support for
the MergeTree storage engine on cloud platforms for improving scan
performance against parquet. This report marks the first comprehensive
update on the current status of gluten-clickhouse-backend.

We have successfully added support for Spark versions 3.2, 3.3, and 3.5;
however, it is important to note that we do not plan to implement support
for Spark version 3.4 due to resource considerations.

Moving forward, I will endeavor to provide weekly updates regarding our
progress and developments within this project. Over the past month, several
key pull requests (PRs) stand out:

   -

   Support spark 3.2, 3.3 and 3.5, we don't plan to support spark 3.4, with
   -

   Now, we fully support tpcds and tpch
   -

      https://github.com/apache/incubator-gluten/pull/7072 and
      https://github.com/apache/incubator-gluten/issues/7180
      -

      https://github.com/apache/incubator-gluten/pull/7176
      -

   Improve decimal performance :
   https://github.com/apache/incubator-gluten/pull/7196
   -

   MergeTree meta: https://github.com/apache/incubator-gluten/pull/7239

To illustrate our current benchmarking results using TPCDS with a scale
factor of 100 across one master node and three worker nodes equipped with a
total of 48 cores:
Spark version 3.5.1time(s)
Vanilla Spark [Local Parquet] 589.5
Gluten[master local Parquet]* 321.9
Gluten[master mergetree on s3] 279.7

Node:

   1.

   It is noteworthy that since we cache MergeTree structures locally on
   disk during testing phases, parquet files were placed directly onto local
   worker nodes to ensure fairness by minimizing potential IO latency issues
   that could skew benchmark results.
   2.

   The current benchmark includes fixes addressing previously identified
   issues detailed here:
   https://github.com/apache/incubator-gluten/issues/7394.

We would like to express gratitude towards all contributors who have played
an integral role in advancing this project’s objectives—particularly lgbo (
[email protected]), 李杨 ([email protected]), shuai ([email protected]),
Zhizhao Zhao ([email protected]), Liu Neng ([email protected]). Their
collaborative efforts continue driving innovation within the Gluten
framework while enhancing its overall functionality and user experience.

Reply via email to