This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch asf-staging in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/asf-staging by this push: new f1d6da6 Commit build products f1d6da6 is described below commit f1d6da64d930c0b65091595aeace9bd2a088dc51 Author: Build Pelican (action) <priv...@infra.apache.org> AuthorDate: Thu Mar 20 21:46:42 2025 +0000 Commit build products --- blog/2025/03/20/datafusion-comet-0.7.0/index.html | 16 +++++++++------- blog/feeds/all-en.atom.xml | 16 +++++++++------- blog/feeds/blog.atom.xml | 16 +++++++++------- blog/feeds/pmc.atom.xml | 16 +++++++++------- 4 files changed, 36 insertions(+), 28 deletions(-) diff --git a/blog/2025/03/20/datafusion-comet-0.7.0/index.html b/blog/2025/03/20/datafusion-comet-0.7.0/index.html index 763cd9e..92044f5 100644 --- a/blog/2025/03/20/datafusion-comet-0.7.0/index.html +++ b/blog/2025/03/20/datafusion-comet-0.7.0/index.html @@ -95,16 +95,18 @@ stored locally in Parquet format on NVMe storage. Spark was running in Kubernete <p>When using the <code>spark.comet.exec.replaceSortMergeJoin</code> setting to replace sort-merge joins with hash joins, Comet will now do a better job of picking the optimal build side. Thanks to <a href="https://github.com/hayman42">@hayman42</a> for suggesting this, and thanks to the <a href="https://github.com/apache/incubator-gluten/">Apache Gluten(incubating)</a> project for the inspiration in implementing this feature.</p> -<h2>Experimental Support for DataFusion’s DataSourceExec</h2> -<p>It is now possible to configure Comet to use DataFusion’s <code>DataSourceExec</code> instead of Comet’s current Parquet reader. -Support should still be considered experimental, but most of Comet’s unit tests are now passing with the new reader. +<h2>Experimental Support for DataFusion’s Parquet Scan</h2> +<p>It is now possible to configure Comet to use DataFusion’s Parquet reader instead of Comet’s current Parquet reader. This +has the advantage of supporting complex types, and also has performance optimizations that are not present in Comet's +existing reader.</p> +<p>Support should still be considered experimental, but most of Comet’s unit tests are now passing with the new reader. Known issues include handling of <code>INT96</code> timestamps and unsigned bytes and shorts.</p> -<p>To enable DataFusion’s <code>DataSourceExec</code>, either set <code>spark.comet.scan.impl=native_datafusion</code> or set the environment +<p>To enable DataFusion’s Parquet reader, either set <code>spark.comet.scan.impl=native_datafusion</code> or set the environment variable <code>COMET_PARQUET_SCAN_IMPL=native_datafusion</code>.</p> <h2>Complex Type Support</h2> -<p>With DataFusion’s <code>DataSourceExec</code> enabled, there is now some early support for reading structs from Parquet. This is -largely untested and we would welcome additional testing from the community to help determine what is and isn’t working, -as well as contributions to improve support for structs and other complex types. The tracking issue is +<p>With DataFusion’s Parquet reader enabled, there is now some early support for reading structs from Parquet. This is +not thoroughly tested yet. We would welcome additional testing from the community to help determine what is and isn’t +working, as well as contributions to improve support for structs and other complex types. The tracking issue is <a href="https://github.com/apache/datafusion-comet/issues/1043">https://github.com/apache/datafusion-comet/issues/1043</a>.</p> <h2>Updates to supported Spark versions</h2> <ul> diff --git a/blog/feeds/all-en.atom.xml b/blog/feeds/all-en.atom.xml index 037f4fa..2dc033d 100644 --- a/blog/feeds/all-en.atom.xml +++ b/blog/feeds/all-en.atom.xml @@ -72,16 +72,18 @@ stored locally in Parquet format on NVMe storage. Spark was running in Kubernete <p>When using the <code>spark.comet.exec.replaceSortMergeJoin</code> setting to replace sort-merge joins with hash joins, Comet will now do a better job of picking the optimal build side. Thanks to <a href="https://github.com/hayman42">@hayman42</a> for suggesting this, and thanks to the <a href="https://github.com/apache/incubator-gluten/">Apache Gluten(incubating)</a> project for the inspiration in implementing this feature.</p> -<h2>Experimental Support for DataFusion&rsquo;s DataSourceExec</h2> -<p>It is now possible to configure Comet to use DataFusion&rsquo;s <code>DataSourceExec</code> instead of Comet&rsquo;s current Parquet reader. -Support should still be considered experimental, but most of Comet&rsquo;s unit tests are now passing with the new reader. +<h2>Experimental Support for DataFusion&rsquo;s Parquet Scan</h2> +<p>It is now possible to configure Comet to use DataFusion&rsquo;s Parquet reader instead of Comet&rsquo;s current Parquet reader. This +has the advantage of supporting complex types, and also has performance optimizations that are not present in Comet's +existing reader.</p> +<p>Support should still be considered experimental, but most of Comet&rsquo;s unit tests are now passing with the new reader. Known issues include handling of <code>INT96</code> timestamps and unsigned bytes and shorts.</p> -<p>To enable DataFusion&rsquo;s <code>DataSourceExec</code>, either set <code>spark.comet.scan.impl=native_datafusion</code> or set the environment +<p>To enable DataFusion&rsquo;s Parquet reader, either set <code>spark.comet.scan.impl=native_datafusion</code> or set the environment variable <code>COMET_PARQUET_SCAN_IMPL=native_datafusion</code>.</p> <h2>Complex Type Support</h2> -<p>With DataFusion&rsquo;s <code>DataSourceExec</code> enabled, there is now some early support for reading structs from Parquet. This is -largely untested and we would welcome additional testing from the community to help determine what is and isn&rsquo;t working, -as well as contributions to improve support for structs and other complex types. The tracking issue is +<p>With DataFusion&rsquo;s Parquet reader enabled, there is now some early support for reading structs from Parquet. This is +not thoroughly tested yet. We would welcome additional testing from the community to help determine what is and isn&rsquo;t +working, as well as contributions to improve support for structs and other complex types. The tracking issue is <a href="https://github.com/apache/datafusion-comet/issues/1043">https://github.com/apache/datafusion-comet/issues/1043</a>.</p> <h2>Updates to supported Spark versions</h2> <ul> diff --git a/blog/feeds/blog.atom.xml b/blog/feeds/blog.atom.xml index c9d12e0..5de58a8 100644 --- a/blog/feeds/blog.atom.xml +++ b/blog/feeds/blog.atom.xml @@ -72,16 +72,18 @@ stored locally in Parquet format on NVMe storage. Spark was running in Kubernete <p>When using the <code>spark.comet.exec.replaceSortMergeJoin</code> setting to replace sort-merge joins with hash joins, Comet will now do a better job of picking the optimal build side. Thanks to <a href="https://github.com/hayman42">@hayman42</a> for suggesting this, and thanks to the <a href="https://github.com/apache/incubator-gluten/">Apache Gluten(incubating)</a> project for the inspiration in implementing this feature.</p> -<h2>Experimental Support for DataFusion&rsquo;s DataSourceExec</h2> -<p>It is now possible to configure Comet to use DataFusion&rsquo;s <code>DataSourceExec</code> instead of Comet&rsquo;s current Parquet reader. -Support should still be considered experimental, but most of Comet&rsquo;s unit tests are now passing with the new reader. +<h2>Experimental Support for DataFusion&rsquo;s Parquet Scan</h2> +<p>It is now possible to configure Comet to use DataFusion&rsquo;s Parquet reader instead of Comet&rsquo;s current Parquet reader. This +has the advantage of supporting complex types, and also has performance optimizations that are not present in Comet's +existing reader.</p> +<p>Support should still be considered experimental, but most of Comet&rsquo;s unit tests are now passing with the new reader. Known issues include handling of <code>INT96</code> timestamps and unsigned bytes and shorts.</p> -<p>To enable DataFusion&rsquo;s <code>DataSourceExec</code>, either set <code>spark.comet.scan.impl=native_datafusion</code> or set the environment +<p>To enable DataFusion&rsquo;s Parquet reader, either set <code>spark.comet.scan.impl=native_datafusion</code> or set the environment variable <code>COMET_PARQUET_SCAN_IMPL=native_datafusion</code>.</p> <h2>Complex Type Support</h2> -<p>With DataFusion&rsquo;s <code>DataSourceExec</code> enabled, there is now some early support for reading structs from Parquet. This is -largely untested and we would welcome additional testing from the community to help determine what is and isn&rsquo;t working, -as well as contributions to improve support for structs and other complex types. The tracking issue is +<p>With DataFusion&rsquo;s Parquet reader enabled, there is now some early support for reading structs from Parquet. This is +not thoroughly tested yet. We would welcome additional testing from the community to help determine what is and isn&rsquo;t +working, as well as contributions to improve support for structs and other complex types. The tracking issue is <a href="https://github.com/apache/datafusion-comet/issues/1043">https://github.com/apache/datafusion-comet/issues/1043</a>.</p> <h2>Updates to supported Spark versions</h2> <ul> diff --git a/blog/feeds/pmc.atom.xml b/blog/feeds/pmc.atom.xml index 60b6dbe..9bbf15c 100644 --- a/blog/feeds/pmc.atom.xml +++ b/blog/feeds/pmc.atom.xml @@ -72,16 +72,18 @@ stored locally in Parquet format on NVMe storage. Spark was running in Kubernete <p>When using the <code>spark.comet.exec.replaceSortMergeJoin</code> setting to replace sort-merge joins with hash joins, Comet will now do a better job of picking the optimal build side. Thanks to <a href="https://github.com/hayman42">@hayman42</a> for suggesting this, and thanks to the <a href="https://github.com/apache/incubator-gluten/">Apache Gluten(incubating)</a> project for the inspiration in implementing this feature.</p> -<h2>Experimental Support for DataFusion&rsquo;s DataSourceExec</h2> -<p>It is now possible to configure Comet to use DataFusion&rsquo;s <code>DataSourceExec</code> instead of Comet&rsquo;s current Parquet reader. -Support should still be considered experimental, but most of Comet&rsquo;s unit tests are now passing with the new reader. +<h2>Experimental Support for DataFusion&rsquo;s Parquet Scan</h2> +<p>It is now possible to configure Comet to use DataFusion&rsquo;s Parquet reader instead of Comet&rsquo;s current Parquet reader. This +has the advantage of supporting complex types, and also has performance optimizations that are not present in Comet's +existing reader.</p> +<p>Support should still be considered experimental, but most of Comet&rsquo;s unit tests are now passing with the new reader. Known issues include handling of <code>INT96</code> timestamps and unsigned bytes and shorts.</p> -<p>To enable DataFusion&rsquo;s <code>DataSourceExec</code>, either set <code>spark.comet.scan.impl=native_datafusion</code> or set the environment +<p>To enable DataFusion&rsquo;s Parquet reader, either set <code>spark.comet.scan.impl=native_datafusion</code> or set the environment variable <code>COMET_PARQUET_SCAN_IMPL=native_datafusion</code>.</p> <h2>Complex Type Support</h2> -<p>With DataFusion&rsquo;s <code>DataSourceExec</code> enabled, there is now some early support for reading structs from Parquet. This is -largely untested and we would welcome additional testing from the community to help determine what is and isn&rsquo;t working, -as well as contributions to improve support for structs and other complex types. The tracking issue is +<p>With DataFusion&rsquo;s Parquet reader enabled, there is now some early support for reading structs from Parquet. This is +not thoroughly tested yet. We would welcome additional testing from the community to help determine what is and isn&rsquo;t +working, as well as contributions to improve support for structs and other complex types. The tracking issue is <a href="https://github.com/apache/datafusion-comet/issues/1043">https://github.com/apache/datafusion-comet/issues/1043</a>.</p> <h2>Updates to supported Spark versions</h2> <ul> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org For additional commands, e-mail: commits-h...@datafusion.apache.org