(datafusion-comet) branch asf-site updated: Publish built docs triggered by 17a36bcfecd401d43df11f276bfd6b9259a9fa5d

github-bot Fri, 13 Jun 2025 12:45:32 -0700

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 7ab5fd2c8 Publish built docs triggered by 
17a36bcfecd401d43df11f276bfd6b9259a9fa5d
7ab5fd2c8 is described below

commit 7ab5fd2c87757c7cac2eb7b81ece002675d0bf5e
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Fri Jun 13 19:42:46 2025 +0000

    Publish built docs triggered by 17a36bcfecd401d43df11f276bfd6b9259a9fa5d
---
 _sources/user-guide/compatibility.md.txt | 11 +++++++++++
 _sources/user-guide/configs.md.txt       |  2 +-
 searchindex.js                           |  2 +-
 user-guide/compatibility.html            |  8 ++++++++
 user-guide/configs.html                  |  2 +-
 5 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/_sources/user-guide/compatibility.md.txt 
b/_sources/user-guide/compatibility.md.txt
index 39cd3a058..961209075 100644
--- a/_sources/user-guide/compatibility.md.txt
+++ b/_sources/user-guide/compatibility.md.txt
@@ -50,6 +50,8 @@ implementation:
 
 The new scans currently have the following limitations:
 
+Issues common to both `native_datafusion` and `native_iceberg_compat`:
+
 - When reading Parquet files written by systems other than Spark that contain 
columns with the logical types `UINT_8`
 or `UINT_16`, Comet will produce different results than Spark because Spark 
does not preserve or understand these
 logical types. Arrow-based readers, such as DataFusion and Comet do respect 
these types and read the data as unsigned
@@ -58,12 +60,21 @@ types (regardless of the logical type). This behavior can 
be disabled by setting
 `spark.comet.scan.allowIncompatible=true`.
 - There is a known performance issue when pushing filters down to Parquet. See 
the [Comet Tuning Guide] for more
 information.
+- Reading maps containing complex types can result in errors or incorrect 
results [#1754]
+- `PARQUET_FIELD_ID_READ_ENABLED` is not respected [#1758]
 - There are failures in the Spark SQL test suite when enabling these new scans 
(tracking issues: [#1542] and [#1545]).
 - No support for default values that are nested types (e.g., maps, arrays, 
structs). Literal default values are supported.
 - Setting Spark configs `ignoreMissingFiles` or `ignoreCorruptFiles` to `true` 
is not compatible with `native_datafusion` scan.
 
+Issues specific to `native_datafusion`:
+
+- Bucketed scans are not supported
+- No support for row indexes
+
 [#1545]: https://github.com/apache/datafusion-comet/issues/1545
 [#1542]: https://github.com/apache/datafusion-comet/issues/1542
+[#1754]: https://github.com/apache/datafusion-comet/issues/1754
+[#1758]: https://github.com/apache/datafusion-comet/issues/1758
 [Comet Tuning Guide]: tuning.md
 
 ## ANSI mode
diff --git a/_sources/user-guide/configs.md.txt 
b/_sources/user-guide/configs.md.txt
index 517ce960c..f27830320 100644
--- a/_sources/user-guide/configs.md.txt
+++ b/_sources/user-guide/configs.md.txt
@@ -83,7 +83,7 @@ Comet provides the following configuration settings.
 | spark.comet.parquet.read.parallel.io.enabled | Whether to enable Comet's 
parallel reader for Parquet files. The parallel reader reads ranges of 
consecutive data in a  file in parallel. It is faster for large files and row 
groups but uses more resources. | true |
 | spark.comet.parquet.read.parallel.io.thread-pool.size | The maximum number 
of parallel threads the parallel reader will use in a single executor. For 
executors configured with a smaller number of cores, use a smaller number. | 16 
|
 | spark.comet.regexp.allowIncompatible | Comet is not currently fully 
compatible with Spark for all regular expressions. Set this config to true to 
allow them anyway. For more information, refer to the Comet Compatibility Guide 
(https://datafusion.apache.org/comet/user-guide/compatibility.html). | false |
-| spark.comet.scan.allowIncompatible | Comet is not currently fully compatible 
with Spark for all datatypes. Set this config to true to allow them anyway. For 
more information, refer to the Comet Compatibility Guide 
(https://datafusion.apache.org/comet/user-guide/compatibility.html). | false |
+| spark.comet.scan.allowIncompatible | Some Comet scan implementations are not 
currently fully compatible with Spark for all datatypes. Set this config to 
true to allow them anyway. For more information, refer to the Comet 
Compatibility Guide 
(https://datafusion.apache.org/comet/user-guide/compatibility.html). | false |
 | spark.comet.scan.enabled | Whether to enable native scans. When this is 
turned on, Spark will use Comet to read supported data sources (currently only 
Parquet is supported natively). Note that to enable native vectorized 
execution, both this config and 'spark.comet.exec.enabled' need to be enabled. 
| true |
 | spark.comet.scan.preFetch.enabled | Whether to enable pre-fetching feature 
of CometScan. | false |
 | spark.comet.scan.preFetch.threadNum | The number of threads running 
pre-fetching for CometScan. Effective if spark.comet.scan.preFetch.enabled is 
enabled. Note that more pre-fetching threads means more memory requirement to 
store pre-fetched row groups. | 2 |
diff --git a/searchindex.js b/searchindex.js
index f304a4312..d8845b88c 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Install Comet": [[11, "install-comet"]], 
"2. Clone Spark and Apply Diff": [[11, "clone-spark-and-apply-diff"]], "3. Run 
Spark SQL Tests": [[11, "run-spark-sql-tests"]], "ANSI mode": [[14, 
"ansi-mode"]], "API Differences Between Spark Versions": [[0, 
"api-differences-between-spark-versions"]], "ASF Links": [[13, null]], 
"Accelerating Apache Iceberg Parquet Scans using Comet (Experimental)": [[19, 
null]], "Adding Spark-side Tests for the New Expression":  [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Install Comet": [[11, "install-comet"]], 
"2. Clone Spark and Apply Diff": [[11, "clone-spark-and-apply-diff"]], "3. Run 
Spark SQL Tests": [[11, "run-spark-sql-tests"]], "ANSI mode": [[14, 
"ansi-mode"]], "API Differences Between Spark Versions": [[0, 
"api-differences-between-spark-versions"]], "ASF Links": [[13, null]], 
"Accelerating Apache Iceberg Parquet Scans using Comet (Experimental)": [[19, 
null]], "Adding Spark-side Tests for the New Expression":  [...]
\ No newline at end of file
diff --git a/user-guide/compatibility.html b/user-guide/compatibility.html
index a09f16ae5..7d1b19f68 100644
--- a/user-guide/compatibility.html
+++ b/user-guide/compatibility.html
@@ -429,6 +429,7 @@ implementation:</p>
 <li><p>Improves performance</p></li>
 </ul>
 <p>The new scans currently have the following limitations:</p>
+<p>Issues common to both <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> and <code class="docutils literal 
notranslate"><span class="pre">native_iceberg_compat</span></code>:</p>
 <ul class="simple">
 <li><p>When reading Parquet files written by systems other than Spark that 
contain columns with the logical types <code class="docutils literal 
notranslate"><span class="pre">UINT_8</span></code>
 or <code class="docutils literal notranslate"><span 
class="pre">UINT_16</span></code>, Comet will produce different results than 
Spark because Spark does not preserve or understand these
@@ -438,10 +439,17 @@ types (regardless of the logical type). This behavior can 
be disabled by setting
 <code class="docutils literal notranslate"><span 
class="pre">spark.comet.scan.allowIncompatible=true</span></code>.</p></li>
 <li><p>There is a known performance issue when pushing filters down to 
Parquet. See the <a class="reference internal" href="tuning.html"><span 
class="std std-doc">Comet Tuning Guide</span></a> for more
 information.</p></li>
+<li><p>Reading maps containing complex types can result in errors or incorrect 
results <a class="reference external" 
href="https://github.com/apache/datafusion-comet/issues/1754";>#1754</a></p></li>
+<li><p><code class="docutils literal notranslate"><span 
class="pre">PARQUET_FIELD_ID_READ_ENABLED</span></code> is not respected <a 
class="reference external" 
href="https://github.com/apache/datafusion-comet/issues/1758";>#1758</a></p></li>
 <li><p>There are failures in the Spark SQL test suite when enabling these new 
scans (tracking issues: <a class="reference external" 
href="https://github.com/apache/datafusion-comet/issues/1542";>#1542</a> and <a 
class="reference external" 
href="https://github.com/apache/datafusion-comet/issues/1545";>#1545</a>).</p></li>
 <li><p>No support for default values that are nested types (e.g., maps, 
arrays, structs). Literal default values are supported.</p></li>
 <li><p>Setting Spark configs <code class="docutils literal notranslate"><span 
class="pre">ignoreMissingFiles</span></code> or <code class="docutils literal 
notranslate"><span class="pre">ignoreCorruptFiles</span></code> to <code 
class="docutils literal notranslate"><span class="pre">true</span></code> is 
not compatible with <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> scan.</p></li>
 </ul>
+<p>Issues specific to <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code>:</p>
+<ul class="simple">
+<li><p>Bucketed scans are not supported</p></li>
+<li><p>No support for row indexes</p></li>
+</ul>
 </section>
 <section id="ansi-mode">
 <h2>ANSI mode<a class="headerlink" href="#ansi-mode" title="Link to this 
heading">¶</a></h2>
diff --git a/user-guide/configs.html b/user-guide/configs.html
index 43e518183..cdd589429 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -565,7 +565,7 @@ TO MODIFY THIS CONTENT MAKE SURE THAT YOU MAKE YOUR CHANGES 
TO THE TEMPLATE FILE
 <td><p>false</p></td>
 </tr>
 <tr class="row-even"><td><p>spark.comet.scan.allowIncompatible</p></td>
-<td><p>Comet is not currently fully compatible with Spark for all datatypes. 
Set this config to true to allow them anyway. For more information, refer to 
the Comet Compatibility Guide 
(https://datafusion.apache.org/comet/user-guide/compatibility.html).</p></td>
+<td><p>Some Comet scan implementations are not currently fully compatible with 
Spark for all datatypes. Set this config to true to allow them anyway. For more 
information, refer to the Comet Compatibility Guide 
(https://datafusion.apache.org/comet/user-guide/compatibility.html).</p></td>
 <td><p>false</p></td>
 </tr>
 <tr class="row-odd"><td><p>spark.comet.scan.enabled</p></td>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org

(datafusion-comet) branch asf-site updated: Publish built docs triggered by 17a36bcfecd401d43df11f276bfd6b9259a9fa5d

Reply via email to