This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 9843597fd Publish built docs triggered by
a6b340e4bc988094aae90767eb9f8dc85f441598
9843597fd is described below
commit 9843597fd84f0a98947a35588be9cb87bb609cb9
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Mon Mar 2 22:13:57 2026 +0000
Publish built docs triggered by a6b340e4bc988094aae90767eb9f8dc85f441598
---
_sources/contributor-guide/parquet_scans.md.txt | 8 ++++----
_sources/user-guide/latest/configs.md.txt | 2 --
contributor-guide/parquet_scans.html | 8 ++++----
searchindex.js | 2 +-
user-guide/latest/configs.html | 8 --------
5 files changed, 9 insertions(+), 19 deletions(-)
diff --git a/_sources/contributor-guide/parquet_scans.md.txt
b/_sources/contributor-guide/parquet_scans.md.txt
index 7df939488..c8e960a15 100644
--- a/_sources/contributor-guide/parquet_scans.md.txt
+++ b/_sources/contributor-guide/parquet_scans.md.txt
@@ -49,10 +49,10 @@ The following features are not supported by either scan
implementation, and Come
The following shared limitation may produce incorrect results without falling
back to Spark:
-- No support for datetime rebasing detection or the
`spark.comet.exceptionOnDatetimeRebase` configuration. When
- reading Parquet files containing dates or timestamps written before Spark
3.0 (which used a hybrid
- Julian/Gregorian calendar), dates/timestamps will be read as if they were
written using the Proleptic Gregorian
- calendar. This may produce incorrect results for dates before October 15,
1582.
+- No support for datetime rebasing. When reading Parquet files containing
dates or timestamps written before
+ Spark 3.0 (which used a hybrid Julian/Gregorian calendar), dates/timestamps
will be read as if they were
+ written using the Proleptic Gregorian calendar. This may produce incorrect
results for dates before
+ October 15, 1582.
The `native_datafusion` scan has some additional limitations, mostly related
to Parquet metadata. All of these
cause Comet to fall back to Spark.
diff --git a/_sources/user-guide/latest/configs.md.txt
b/_sources/user-guide/latest/configs.md.txt
index 9a3accc0c..1a4f7e7cd 100644
--- a/_sources/user-guide/latest/configs.md.txt
+++ b/_sources/user-guide/latest/configs.md.txt
@@ -30,8 +30,6 @@ Comet provides the following configuration settings.
| `spark.comet.scan.enabled` | Whether to enable native scans. When this is
turned on, Spark will use Comet to read supported data sources (currently only
Parquet is supported natively). Note that to enable native vectorized
execution, both this config and `spark.comet.exec.enabled` need to be enabled.
| true |
| `spark.comet.scan.icebergNative.dataFileConcurrencyLimit` | The number of
Iceberg data files to read concurrently within a single task. Higher values
improve throughput for tables with many small files by overlapping I/O latency,
but increase memory usage. Values between 2 and 8 are suggested. | 1 |
| `spark.comet.scan.icebergNative.enabled` | Whether to enable native Iceberg
table scan using iceberg-rust. When enabled, Iceberg tables are read directly
through native execution, bypassing Spark's DataSource V2 API for better
performance. | false |
-| `spark.comet.scan.preFetch.enabled` | Whether to enable pre-fetching feature
of CometScan. | false |
-| `spark.comet.scan.preFetch.threadNum` | The number of threads running
pre-fetching for CometScan. Effective if spark.comet.scan.preFetch.enabled is
enabled. Note that more pre-fetching threads means more memory requirement to
store pre-fetched row groups. | 2 |
| `spark.comet.scan.unsignedSmallIntSafetyCheck` | Parquet files may contain
unsigned 8-bit integers (UINT_8) which Spark maps to ShortType. When this
config is true (default), Comet falls back to Spark for ShortType columns
because we cannot distinguish signed INT16 (safe) from unsigned UINT_8 (may
produce different results). Set to false to allow native execution of ShortType
columns if you know your data does not contain unsigned UINT_8 columns from
improperly encoded Parquet files. F [...]
| `spark.hadoop.fs.comet.libhdfs.schemes` | Defines filesystem schemes (e.g.,
hdfs, webhdfs) that the native side accesses via libhdfs, separated by commas.
Valid only when built with hdfs feature enabled. | |
<!-- prettier-ignore-end -->
diff --git a/contributor-guide/parquet_scans.html
b/contributor-guide/parquet_scans.html
index 52544b2ea..5e4364e51 100644
--- a/contributor-guide/parquet_scans.html
+++ b/contributor-guide/parquet_scans.html
@@ -496,10 +496,10 @@ V2 API for Parquet scans. The DataFusion-based
implementations only support the
</ul>
<p>The following shared limitation may produce incorrect results without
falling back to Spark:</p>
<ul class="simple">
-<li><p>No support for datetime rebasing detection or the <code class="docutils
literal notranslate"><span
class="pre">spark.comet.exceptionOnDatetimeRebase</span></code> configuration.
When
-reading Parquet files containing dates or timestamps written before Spark 3.0
(which used a hybrid
-Julian/Gregorian calendar), dates/timestamps will be read as if they were
written using the Proleptic Gregorian
-calendar. This may produce incorrect results for dates before October 15,
1582.</p></li>
+<li><p>No support for datetime rebasing. When reading Parquet files containing
dates or timestamps written before
+Spark 3.0 (which used a hybrid Julian/Gregorian calendar), dates/timestamps
will be read as if they were
+written using the Proleptic Gregorian calendar. This may produce incorrect
results for dates before
+October 15, 1582.</p></li>
</ul>
<p>The <code class="docutils literal notranslate"><span
class="pre">native_datafusion</span></code> scan has some additional
limitations, mostly related to Parquet metadata. All of these
cause Comet to fall back to Spark.</p>
diff --git a/searchindex.js b/searchindex.js
index dd309a87a..5b9e77fb9 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Format Your Code": [[12,
"format-your-code"]], "1. Install Comet": [[22, "install-comet"]], "1. Native
Operators (nativeExecs map)": [[4, "native-operators-nativeexecs-map"]], "2.
Build and Verify": [[12, "build-and-verify"]], "2. Clone Spark and Apply Diff":
[[22, "clone-spark-and-apply-diff"]], "2. Sink Operators (sinks map)": [[4,
"sink-operators-sinks-map"]], "3. Comet JVM Operators": [[4,
"comet-jvm-operators"]], "3. Run Clippy (Recommended)": [[12 [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Format Your Code": [[12,
"format-your-code"]], "1. Install Comet": [[22, "install-comet"]], "1. Native
Operators (nativeExecs map)": [[4, "native-operators-nativeexecs-map"]], "2.
Build and Verify": [[12, "build-and-verify"]], "2. Clone Spark and Apply Diff":
[[22, "clone-spark-and-apply-diff"]], "2. Sink Operators (sinks map)": [[4,
"sink-operators-sinks-map"]], "3. Comet JVM Operators": [[4,
"comet-jvm-operators"]], "3. Run Clippy (Recommended)": [[12 [...]
\ No newline at end of file
diff --git a/user-guide/latest/configs.html b/user-guide/latest/configs.html
index eaaca6158..a83ba5f1b 100644
--- a/user-guide/latest/configs.html
+++ b/user-guide/latest/configs.html
@@ -485,14 +485,6 @@ under the License.
<td><p>Whether to enable native Iceberg table scan using iceberg-rust. When
enabled, Iceberg tables are read directly through native execution, bypassing
Spark’s DataSource V2 API for better performance.</p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.scan.preFetch.enabled</span></code></p></td>
-<td><p>Whether to enable pre-fetching feature of CometScan.</p></td>
-<td><p>false</p></td>
-</tr>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.scan.preFetch.threadNum</span></code></p></td>
-<td><p>The number of threads running pre-fetching for CometScan. Effective if
spark.comet.scan.preFetch.enabled is enabled. Note that more pre-fetching
threads means more memory requirement to store pre-fetched row groups.</p></td>
-<td><p>2</p></td>
-</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.scan.unsignedSmallIntSafetyCheck</span></code></p></td>
<td><p>Parquet files may contain unsigned 8-bit integers (UINT_8) which Spark
maps to ShortType. When this config is true (default), Comet falls back to
Spark for ShortType columns because we cannot distinguish signed INT16 (safe)
from unsigned UINT_8 (may produce different results). Set to false to allow
native execution of ShortType columns if you know your data does not contain
unsigned UINT_8 columns from improperly encoded Parquet files. For more
information, refer to the <a class=" [...]
<td><p>true</p></td>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]