This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/asf-site by this push: new 7ab5fd2c8 Publish built docs triggered by 17a36bcfecd401d43df11f276bfd6b9259a9fa5d 7ab5fd2c8 is described below commit 7ab5fd2c87757c7cac2eb7b81ece002675d0bf5e Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com> AuthorDate: Fri Jun 13 19:42:46 2025 +0000 Publish built docs triggered by 17a36bcfecd401d43df11f276bfd6b9259a9fa5d --- _sources/user-guide/compatibility.md.txt | 11 +++++++++++ _sources/user-guide/configs.md.txt | 2 +- searchindex.js | 2 +- user-guide/compatibility.html | 8 ++++++++ user-guide/configs.html | 2 +- 5 files changed, 22 insertions(+), 3 deletions(-) diff --git a/_sources/user-guide/compatibility.md.txt b/_sources/user-guide/compatibility.md.txt index 39cd3a058..961209075 100644 --- a/_sources/user-guide/compatibility.md.txt +++ b/_sources/user-guide/compatibility.md.txt @@ -50,6 +50,8 @@ implementation: The new scans currently have the following limitations: +Issues common to both `native_datafusion` and `native_iceberg_compat`: + - When reading Parquet files written by systems other than Spark that contain columns with the logical types `UINT_8` or `UINT_16`, Comet will produce different results than Spark because Spark does not preserve or understand these logical types. Arrow-based readers, such as DataFusion and Comet do respect these types and read the data as unsigned @@ -58,12 +60,21 @@ types (regardless of the logical type). This behavior can be disabled by setting `spark.comet.scan.allowIncompatible=true`. - There is a known performance issue when pushing filters down to Parquet. See the [Comet Tuning Guide] for more information. +- Reading maps containing complex types can result in errors or incorrect results [#1754] +- `PARQUET_FIELD_ID_READ_ENABLED` is not respected [#1758] - There are failures in the Spark SQL test suite when enabling these new scans (tracking issues: [#1542] and [#1545]). - No support for default values that are nested types (e.g., maps, arrays, structs). Literal default values are supported. - Setting Spark configs `ignoreMissingFiles` or `ignoreCorruptFiles` to `true` is not compatible with `native_datafusion` scan. +Issues specific to `native_datafusion`: + +- Bucketed scans are not supported +- No support for row indexes + [#1545]: https://github.com/apache/datafusion-comet/issues/1545 [#1542]: https://github.com/apache/datafusion-comet/issues/1542 +[#1754]: https://github.com/apache/datafusion-comet/issues/1754 +[#1758]: https://github.com/apache/datafusion-comet/issues/1758 [Comet Tuning Guide]: tuning.md ## ANSI mode diff --git a/_sources/user-guide/configs.md.txt b/_sources/user-guide/configs.md.txt index 517ce960c..f27830320 100644 --- a/_sources/user-guide/configs.md.txt +++ b/_sources/user-guide/configs.md.txt @@ -83,7 +83,7 @@ Comet provides the following configuration settings. | spark.comet.parquet.read.parallel.io.enabled | Whether to enable Comet's parallel reader for Parquet files. The parallel reader reads ranges of consecutive data in a file in parallel. It is faster for large files and row groups but uses more resources. | true | | spark.comet.parquet.read.parallel.io.thread-pool.size | The maximum number of parallel threads the parallel reader will use in a single executor. For executors configured with a smaller number of cores, use a smaller number. | 16 | | spark.comet.regexp.allowIncompatible | Comet is not currently fully compatible with Spark for all regular expressions. Set this config to true to allow them anyway. For more information, refer to the Comet Compatibility Guide (https://datafusion.apache.org/comet/user-guide/compatibility.html). | false | -| spark.comet.scan.allowIncompatible | Comet is not currently fully compatible with Spark for all datatypes. Set this config to true to allow them anyway. For more information, refer to the Comet Compatibility Guide (https://datafusion.apache.org/comet/user-guide/compatibility.html). | false | +| spark.comet.scan.allowIncompatible | Some Comet scan implementations are not currently fully compatible with Spark for all datatypes. Set this config to true to allow them anyway. For more information, refer to the Comet Compatibility Guide (https://datafusion.apache.org/comet/user-guide/compatibility.html). | false | | spark.comet.scan.enabled | Whether to enable native scans. When this is turned on, Spark will use Comet to read supported data sources (currently only Parquet is supported natively). Note that to enable native vectorized execution, both this config and 'spark.comet.exec.enabled' need to be enabled. | true | | spark.comet.scan.preFetch.enabled | Whether to enable pre-fetching feature of CometScan. | false | | spark.comet.scan.preFetch.threadNum | The number of threads running pre-fetching for CometScan. Effective if spark.comet.scan.preFetch.enabled is enabled. Note that more pre-fetching threads means more memory requirement to store pre-fetched row groups. | 2 | diff --git a/searchindex.js b/searchindex.js index f304a4312..d8845b88c 100644 --- a/searchindex.js +++ b/searchindex.js @@ -1 +1 @@ -Search.setIndex({"alltitles": {"1. Install Comet": [[11, "install-comet"]], "2. Clone Spark and Apply Diff": [[11, "clone-spark-and-apply-diff"]], "3. Run Spark SQL Tests": [[11, "run-spark-sql-tests"]], "ANSI mode": [[14, "ansi-mode"]], "API Differences Between Spark Versions": [[0, "api-differences-between-spark-versions"]], "ASF Links": [[13, null]], "Accelerating Apache Iceberg Parquet Scans using Comet (Experimental)": [[19, null]], "Adding Spark-side Tests for the New Expression": [...] \ No newline at end of file +Search.setIndex({"alltitles": {"1. Install Comet": [[11, "install-comet"]], "2. Clone Spark and Apply Diff": [[11, "clone-spark-and-apply-diff"]], "3. Run Spark SQL Tests": [[11, "run-spark-sql-tests"]], "ANSI mode": [[14, "ansi-mode"]], "API Differences Between Spark Versions": [[0, "api-differences-between-spark-versions"]], "ASF Links": [[13, null]], "Accelerating Apache Iceberg Parquet Scans using Comet (Experimental)": [[19, null]], "Adding Spark-side Tests for the New Expression": [...] \ No newline at end of file diff --git a/user-guide/compatibility.html b/user-guide/compatibility.html index a09f16ae5..7d1b19f68 100644 --- a/user-guide/compatibility.html +++ b/user-guide/compatibility.html @@ -429,6 +429,7 @@ implementation:</p> <li><p>Improves performance</p></li> </ul> <p>The new scans currently have the following limitations:</p> +<p>Issues common to both <code class="docutils literal notranslate"><span class="pre">native_datafusion</span></code> and <code class="docutils literal notranslate"><span class="pre">native_iceberg_compat</span></code>:</p> <ul class="simple"> <li><p>When reading Parquet files written by systems other than Spark that contain columns with the logical types <code class="docutils literal notranslate"><span class="pre">UINT_8</span></code> or <code class="docutils literal notranslate"><span class="pre">UINT_16</span></code>, Comet will produce different results than Spark because Spark does not preserve or understand these @@ -438,10 +439,17 @@ types (regardless of the logical type). This behavior can be disabled by setting <code class="docutils literal notranslate"><span class="pre">spark.comet.scan.allowIncompatible=true</span></code>.</p></li> <li><p>There is a known performance issue when pushing filters down to Parquet. See the <a class="reference internal" href="tuning.html"><span class="std std-doc">Comet Tuning Guide</span></a> for more information.</p></li> +<li><p>Reading maps containing complex types can result in errors or incorrect results <a class="reference external" href="https://github.com/apache/datafusion-comet/issues/1754">#1754</a></p></li> +<li><p><code class="docutils literal notranslate"><span class="pre">PARQUET_FIELD_ID_READ_ENABLED</span></code> is not respected <a class="reference external" href="https://github.com/apache/datafusion-comet/issues/1758">#1758</a></p></li> <li><p>There are failures in the Spark SQL test suite when enabling these new scans (tracking issues: <a class="reference external" href="https://github.com/apache/datafusion-comet/issues/1542">#1542</a> and <a class="reference external" href="https://github.com/apache/datafusion-comet/issues/1545">#1545</a>).</p></li> <li><p>No support for default values that are nested types (e.g., maps, arrays, structs). Literal default values are supported.</p></li> <li><p>Setting Spark configs <code class="docutils literal notranslate"><span class="pre">ignoreMissingFiles</span></code> or <code class="docutils literal notranslate"><span class="pre">ignoreCorruptFiles</span></code> to <code class="docutils literal notranslate"><span class="pre">true</span></code> is not compatible with <code class="docutils literal notranslate"><span class="pre">native_datafusion</span></code> scan.</p></li> </ul> +<p>Issues specific to <code class="docutils literal notranslate"><span class="pre">native_datafusion</span></code>:</p> +<ul class="simple"> +<li><p>Bucketed scans are not supported</p></li> +<li><p>No support for row indexes</p></li> +</ul> </section> <section id="ansi-mode"> <h2>ANSI mode<a class="headerlink" href="#ansi-mode" title="Link to this heading">ΒΆ</a></h2> diff --git a/user-guide/configs.html b/user-guide/configs.html index 43e518183..cdd589429 100644 --- a/user-guide/configs.html +++ b/user-guide/configs.html @@ -565,7 +565,7 @@ TO MODIFY THIS CONTENT MAKE SURE THAT YOU MAKE YOUR CHANGES TO THE TEMPLATE FILE <td><p>false</p></td> </tr> <tr class="row-even"><td><p>spark.comet.scan.allowIncompatible</p></td> -<td><p>Comet is not currently fully compatible with Spark for all datatypes. Set this config to true to allow them anyway. For more information, refer to the Comet Compatibility Guide (https://datafusion.apache.org/comet/user-guide/compatibility.html).</p></td> +<td><p>Some Comet scan implementations are not currently fully compatible with Spark for all datatypes. Set this config to true to allow them anyway. For more information, refer to the Comet Compatibility Guide (https://datafusion.apache.org/comet/user-guide/compatibility.html).</p></td> <td><p>false</p></td> </tr> <tr class="row-odd"><td><p>spark.comet.scan.enabled</p></td> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org For additional commands, e-mail: commits-h...@datafusion.apache.org