This is an automated email from the ASF dual-hosted git repository.
kevinjqliu pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/main by this push:
new 54815073b1 Doc: Remove Spark 3 specific wordings in docs (#14357)
54815073b1 is described below
commit 54815073b1bf0335b9747d9cb85015ab6a586d90
Author: jackylee <[email protected]>
AuthorDate: Tue Nov 11 02:11:43 2025 +0800
Doc: Remove Spark 3 specific wordings in docs (#14357)
---
docs/docs/spark-getting-started.md | 8 ++++----
docs/docs/spark-procedures.md | 4 +++-
docs/docs/spark-queries.md | 8 ++------
docs/docs/spark-structured-streaming.md | 4 +---
docs/docs/spark-writes.md | 34 ++++++++++++++++-----------------
site/docs/spark-quickstart.md | 12 ++++++------
site/mkdocs.yml | 1 +
7 files changed, 34 insertions(+), 37 deletions(-)
diff --git a/docs/docs/spark-getting-started.md
b/docs/docs/spark-getting-started.md
index 273be539e4..6813c76937 100644
--- a/docs/docs/spark-getting-started.md
+++ b/docs/docs/spark-getting-started.md
@@ -26,17 +26,17 @@ Spark is currently the most feature-rich compute engine for
Iceberg operations.
We recommend you to get started with Spark to understand Iceberg concepts and
features with examples.
You can also view documentations of using Iceberg with other compute engine
under the [Multi-Engine Support](../../multi-engine-support.md) page.
-## Using Iceberg in Spark 3
+## Using Iceberg in Spark
To use Iceberg in a Spark shell, use the `--packages` option:
```sh
-spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{
icebergVersion }}
+spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-{{
sparkVersionMajor }}:{{ icebergVersion }}
```
!!! info
<!-- markdown-link-check-disable-next-line -->
- If you want to include Iceberg in your Spark installation, add the
[`iceberg-spark-runtime-3.5_2.12`
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/{{
icebergVersion }}/iceberg-spark-runtime-3.5_2.12-{{ icebergVersion }}.jar) to
Spark's `jars` folder.
+ If you want to include Iceberg in your Spark installation, add the
[`iceberg-spark-runtime-{{ sparkVersionMajor }}`
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-{{
sparkVersionMajor }}/{{ icebergVersion }}/iceberg-spark-runtime-{{
sparkVersionMajor }}-{{ icebergVersion }}.jar) to Spark's `jars` folder.
### Adding catalogs
@@ -45,7 +45,7 @@ Iceberg comes with
[catalogs](spark-configuration.md#catalogs) that enable SQL c
This command creates a path-based catalog named `local` for tables under
`$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in
catalog:
```sh
-spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{
icebergVersion }}\
+spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-{{
sparkVersionMajor }}:{{ icebergVersion }}\
--conf
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
\
--conf
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
--conf spark.sql.catalog.spark_catalog.type=hive \
diff --git a/docs/docs/spark-procedures.md b/docs/docs/spark-procedures.md
index 6f919ec29f..7f211d9f26 100644
--- a/docs/docs/spark-procedures.md
+++ b/docs/docs/spark-procedures.md
@@ -20,7 +20,9 @@ title: "Procedures"
# Spark Procedures
-To use Iceberg in Spark, first configure [Spark
catalogs](spark-configuration.md). Stored procedures are only available when
using [Iceberg SQL extensions](spark-configuration.md#sql-extensions) in Spark
3.
+To use Iceberg in Spark, first configure [Spark
catalogs](spark-configuration.md).
+For Spark 3.x, stored procedures are only available when using [Iceberg SQL
extensions](spark-configuration.md#sql-extensions) in Spark.
+For Spark 4.0, stored procedures are supported natively without requiring the
Iceberg SQL extensions. However, note that they are __case-sensitive__ in Spark
4.0.
## Usage
diff --git a/docs/docs/spark-queries.md b/docs/docs/spark-queries.md
index a67f53321c..41189d05ff 100644
--- a/docs/docs/spark-queries.md
+++ b/docs/docs/spark-queries.md
@@ -24,7 +24,7 @@ To use Iceberg in Spark, first configure [Spark
catalogs](spark-configuration.md
## Querying with SQL
-In Spark 3, tables use identifiers that include a [catalog
name](spark-configuration.md#using-catalogs).
+In Spark, tables use identifiers that include a [catalog
name](spark-configuration.md#using-catalogs).
```sql
SELECT * FROM prod.db.table; -- catalog: prod, namespace: db, table: table
@@ -45,7 +45,7 @@ SELECT * FROM prod.db.table.files;
| 0 | s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet
| PARQUET | 0 | {1999-01-01, 03} | 1 | 597 | [1 ->
90, 2 -> 62] | [1 -> 1, 2 -> 1] | [1 -> 0, 2 -> 0] | [] | [1 ->
, 2 -> a] | [1 -> , 2 -> a] | null | [4] | null | null |
### Time travel Queries with SQL
-Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS
OF` or `VERSION AS OF` clauses.
+Spark supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION
AS OF` clauses.
The `VERSION AS OF` clause can contain a long snapshot ID or a string branch
or tag name.
!!! info
@@ -180,10 +180,6 @@ spark.read
.load("path/to/table")
```
-!!! info
- Spark 3.0 and earlier versions do not support using `option` with `table`
in DataFrameReader commands. All options will be silently
- ignored. Do not use `table` when attempting to time-travel or use other
options. See [SPARK-32592](https://issues.apache.org/jira/browse/SPARK-32592).
-
### Incremental read
To read appended data incrementally, use:
diff --git a/docs/docs/spark-structured-streaming.md
b/docs/docs/spark-structured-streaming.md
index dd569bc6a5..e722df1ea4 100644
--- a/docs/docs/spark-structured-streaming.md
+++ b/docs/docs/spark-structured-streaming.md
@@ -76,8 +76,6 @@ data.writeStream
.toTable("database.table_name")
```
-If you're using Spark 3.0 or earlier, you need to use `.option("path",
"database.table_name").start()`, instead of `.toTable("database.table_name")`.
-
In the case of the directory-based Hadoop catalog:
```scala
@@ -101,7 +99,7 @@ Iceberg doesn't support experimental [continuous
processing](https://spark.apach
### Partitioned table
-Iceberg requires sorting data by partition per task prior to writing the data.
In Spark tasks are split by Spark partition.
+Iceberg requires sorting data by partition per task prior to writing the data.
In Spark tasks are split by Spark partition
against partitioned table. For batch queries you're encouraged to do explicit
sort to fulfill the requirement
(see [here](spark-writes.md#writing-distribution-modes)), but the approach
would bring additional latency as
repartition and sort are considered as heavy operations for streaming
workload. To avoid additional latency, you can
diff --git a/docs/docs/spark-writes.md b/docs/docs/spark-writes.md
index 87cf6bc299..f224894a45 100644
--- a/docs/docs/spark-writes.md
+++ b/docs/docs/spark-writes.md
@@ -22,25 +22,25 @@ title: "Writes"
To use Iceberg in Spark, first configure [Spark
catalogs](spark-configuration.md).
-Some plans are only available when using [Iceberg SQL
extensions](spark-configuration.md#sql-extensions) in Spark 3.
+Some plans are only available when using [Iceberg SQL
extensions](spark-configuration.md#sql-extensions).
Iceberg uses Apache Spark's DataSourceV2 API for data source and catalog
implementations. Spark DSv2 is an evolving API with different levels of support
in Spark versions:
-| Feature support | Spark 3 | Notes
|
-|--------------------------------------------------|-----------|-----------------------------------------------------------------------------|
-| [SQL insert into](#insert-into) | ✔️ | ⚠ Requires
`spark.sql.storeAssignmentPolicy=ANSI` (default since Spark 3.0) |
-| [SQL merge into](#merge-into) | ✔️ | ⚠ Requires
Iceberg Spark extensions |
-| [SQL insert overwrite](#insert-overwrite) | ✔️ | ⚠ Requires
`spark.sql.storeAssignmentPolicy=ANSI` (default since Spark 3.0) |
-| [SQL delete from](#delete-from) | ✔️ | ⚠ Row-level
delete requires Iceberg Spark extensions |
-| [SQL update](#update) | ✔️ | ⚠ Requires
Iceberg Spark extensions |
-| [DataFrame append](#appending-data) | ✔️ |
|
-| [DataFrame overwrite](#overwriting-data) | ✔️ |
|
-| [DataFrame CTAS and RTAS](#creating-tables) | ✔️ | ⚠ Requires
DSv2 API |
-| [DataFrame merge into](#merging-data) | ✔️ | ⚠ Requires
DSv2 API (Spark 4.0 and later) |
+| Feature support | Spark | Notes
|
+|--------------------------------------------------|---------|-----------------------------------------------------------------------------|
+| [SQL insert into](#insert-into) | ✔️ | ⚠ Requires
`spark.sql.storeAssignmentPolicy=ANSI` (default since Spark 3.0) |
+| [SQL merge into](#merge-into) | ✔️ | ⚠ Requires
Iceberg Spark extensions |
+| [SQL insert overwrite](#insert-overwrite) | ✔️ | ⚠ Requires
`spark.sql.storeAssignmentPolicy=ANSI` (default since Spark 3.0) |
+| [SQL delete from](#delete-from) | ✔️ | ⚠ Row-level
delete requires Iceberg Spark extensions |
+| [SQL update](#update) | ✔️ | ⚠ Requires
Iceberg Spark extensions |
+| [DataFrame append](#appending-data) | ✔️ |
|
+| [DataFrame overwrite](#overwriting-data) | ✔️ |
|
+| [DataFrame CTAS and RTAS](#creating-tables) | ✔️ | ⚠ Requires DSv2
API |
+| [DataFrame merge into](#merging-data) | ✔️ | ⚠ Requires DSv2
API (Spark 4.0 and later) |
## Writing with SQL
-Spark 3 supports SQL `INSERT INTO`, `MERGE INTO`, and `INSERT OVERWRITE`, as
well as the new `DataFrameWriterV2` API.
+Spark supports SQL `INSERT INTO`, `MERGE INTO`, and `INSERT OVERWRITE`, as
well as the new `DataFrameWriterV2` API.
### `INSERT INTO`
@@ -55,7 +55,7 @@ INSERT INTO prod.db.table SELECT ...
### `MERGE INTO`
-Spark 3 added support for `MERGE INTO` queries that can express row-level
updates.
+Spark supports `MERGE INTO` queries that can express row-level updates.
Iceberg supports `MERGE INTO` by rewriting data files that contain rows that
need to be updated in an `overwrite` commit.
@@ -161,7 +161,7 @@ Note that this mode cannot replace hourly partitions like
the dynamic example qu
### `DELETE FROM`
-Spark 3 added support for `DELETE FROM` queries to remove data from tables.
+Spark supports `DELETE FROM` queries to remove data from tables.
Delete queries accept a filter to match rows to delete.
@@ -253,7 +253,7 @@
data.writeTo("prod.db.table.branch_audit").overwritePartitions()
## Writing with DataFrames
-Spark 3 introduced the new `DataFrameWriterV2` API for writing to tables using
data frames. The v2 API is recommended for several reasons:
+Spark introduced the new `DataFrameWriterV2` API for writing to tables using
data frames. The v2 API is recommended for several reasons:
* CTAS, RTAS, and overwrite by filter are supported
* All operations consistently write columns to a table by name
@@ -268,7 +268,7 @@ Spark 3 introduced the new `DataFrameWriterV2` API for
writing to tables using d
The v1 DataFrame `write` API is still supported, but is not recommended.
!!! danger
- When writing with the v1 DataFrame API in Spark 3, use `saveAsTable` or
`insertInto` to load tables with a catalog.
+ When writing with the v1 DataFrame API in Spark, use `saveAsTable` or
`insertInto` to load tables with a catalog.
Using `format("iceberg")` loads an isolated table reference that will not
automatically refresh tables used by queries.
### Appending data
diff --git a/site/docs/spark-quickstart.md b/site/docs/spark-quickstart.md
index 262a03c581..c8f9dd2b3a 100644
--- a/site/docs/spark-quickstart.md
+++ b/site/docs/spark-quickstart.md
@@ -274,7 +274,7 @@ This configuration creates a path-based catalog named
`local` for tables under `
=== "CLI"
```sh
- spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{
icebergVersion }}\
+ spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-{{
sparkVersionMajor }}:{{ icebergVersion }}\
--conf
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
\
--conf
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
--conf spark.sql.catalog.spark_catalog.type=hive \
@@ -287,7 +287,7 @@ This configuration creates a path-based catalog named
`local` for tables under `
=== "spark-defaults.conf"
```sh
- spark.jars.packages
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }}
+ spark.jars.packages
org.apache.iceberg:iceberg-spark-runtime-{{ sparkVersionMajor }}:{{
icebergVersion }}
spark.sql.extensions
org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.spark_catalog
org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.catalog.spark_catalog.type hive
@@ -309,19 +309,19 @@ If you already have a Spark environment, you can add
Iceberg, using the `--packa
=== "SparkSQL"
```sh
- spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{
icebergVersion }}
+ spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-{{
sparkVersionMajor }}:{{ icebergVersion }}
```
=== "Spark-Shell"
```sh
- spark-shell --packages
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }}
+ spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-{{
sparkVersionMajor }}:{{ icebergVersion }}
```
=== "PySpark"
```sh
- pyspark --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{
icebergVersion }}
+ pyspark --packages org.apache.iceberg:iceberg-spark-runtime-{{
sparkVersionMajor }}:{{ icebergVersion }}
```
!!! note
@@ -329,7 +329,7 @@ If you already have a Spark environment, you can add
Iceberg, using the `--packa
You can download the runtime by visiting to the [Releases](releases.md)
page.
<!-- markdown-link-check-disable-next-line -->
-[spark-runtime-jar]:
https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/{{
icebergVersion }}/iceberg-spark-runtime-3.5_2.12-{{ icebergVersion }}.jar
+[spark-runtime-jar]:
https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-{{
sparkVersionMajor }}/{{ icebergVersion }}/iceberg-spark-runtime-{{
sparkVersionMajor }}-{{ icebergVersion }}.jar
#### Learn More
diff --git a/site/mkdocs.yml b/site/mkdocs.yml
index d4877b1ede..ebf6006b67 100644
--- a/site/mkdocs.yml
+++ b/site/mkdocs.yml
@@ -88,6 +88,7 @@ extra:
nessieVersion: '0.104.5'
flinkVersion: '2.0.0'
flinkVersionMajor: '2.0'
+ sparkVersionMajor: '4.0_2.13'
social:
- icon: fontawesome/regular/comments
link: 'https://iceberg.apache.org/community/'