(iceberg) branch main updated: Doc: Remove Spark 3 specific wordings in docs (#14357)

kevinjqliu Mon, 10 Nov 2025 10:14:09 -0800

This is an automated email from the ASF dual-hosted git repository.

kevinjqliu pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg.git



The following commit(s) were added to refs/heads/main by this push:
     new 54815073b1 Doc: Remove Spark 3 specific wordings in docs (#14357)
54815073b1 is described below

commit 54815073b1bf0335b9747d9cb85015ab6a586d90
Author: jackylee <[email protected]>
AuthorDate: Tue Nov 11 02:11:43 2025 +0800

    Doc: Remove Spark 3 specific wordings in docs (#14357)
---
 docs/docs/spark-getting-started.md      |  8 ++++----
 docs/docs/spark-procedures.md           |  4 +++-
 docs/docs/spark-queries.md              |  8 ++------
 docs/docs/spark-structured-streaming.md |  4 +---
 docs/docs/spark-writes.md               | 34 ++++++++++++++++-----------------
 site/docs/spark-quickstart.md           | 12 ++++++------
 site/mkdocs.yml                         |  1 +
 7 files changed, 34 insertions(+), 37 deletions(-)

diff --git a/docs/docs/spark-getting-started.md 
b/docs/docs/spark-getting-started.md
index 273be539e4..6813c76937 100644
--- a/docs/docs/spark-getting-started.md
+++ b/docs/docs/spark-getting-started.md
@@ -26,17 +26,17 @@ Spark is currently the most feature-rich compute engine for 
Iceberg operations.
 We recommend you to get started with Spark to understand Iceberg concepts and 
features with examples.
 You can also view documentations of using Iceberg with other compute engine 
under the [Multi-Engine Support](../../multi-engine-support.md) page.
 
-## Using Iceberg in Spark 3
+## Using Iceberg in Spark
 
 To use Iceberg in a Spark shell, use the `--packages` option:
 
 ```sh
-spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ 
icebergVersion }}
+spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-{{ 
sparkVersionMajor }}:{{ icebergVersion }}
 ```
 
 !!! info
     <!-- markdown-link-check-disable-next-line -->
-    If you want to include Iceberg in your Spark installation, add the 
[`iceberg-spark-runtime-3.5_2.12` 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/{{
 icebergVersion }}/iceberg-spark-runtime-3.5_2.12-{{ icebergVersion }}.jar) to 
Spark's `jars` folder.
+    If you want to include Iceberg in your Spark installation, add the 
[`iceberg-spark-runtime-{{ sparkVersionMajor }}` 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-{{
 sparkVersionMajor }}/{{ icebergVersion }}/iceberg-spark-runtime-{{ 
sparkVersionMajor }}-{{ icebergVersion }}.jar) to Spark's `jars` folder.
 
 ### Adding catalogs
 
@@ -45,7 +45,7 @@ Iceberg comes with 
[catalogs](spark-configuration.md#catalogs) that enable SQL c
 This command creates a path-based catalog named `local` for tables under 
`$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in 
catalog:
 
 ```sh
-spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ 
icebergVersion }}\
+spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-{{ 
sparkVersionMajor }}:{{ icebergVersion }}\
     --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
     --conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
     --conf spark.sql.catalog.spark_catalog.type=hive \
diff --git a/docs/docs/spark-procedures.md b/docs/docs/spark-procedures.md
index 6f919ec29f..7f211d9f26 100644
--- a/docs/docs/spark-procedures.md
+++ b/docs/docs/spark-procedures.md
@@ -20,7 +20,9 @@ title: "Procedures"
 
 # Spark Procedures
 
-To use Iceberg in Spark, first configure [Spark 
catalogs](spark-configuration.md). Stored procedures are only available when 
using [Iceberg SQL extensions](spark-configuration.md#sql-extensions) in Spark 
3.
+To use Iceberg in Spark, first configure [Spark 
catalogs](spark-configuration.md).  
+For Spark 3.x, stored procedures are only available when using [Iceberg SQL 
extensions](spark-configuration.md#sql-extensions) in Spark.  
+For Spark 4.0, stored procedures are supported natively without requiring the 
Iceberg SQL extensions. However, note that they are __case-sensitive__ in Spark 
4.0.
 
 ## Usage
 
diff --git a/docs/docs/spark-queries.md b/docs/docs/spark-queries.md
index a67f53321c..41189d05ff 100644
--- a/docs/docs/spark-queries.md
+++ b/docs/docs/spark-queries.md
@@ -24,7 +24,7 @@ To use Iceberg in Spark, first configure [Spark 
catalogs](spark-configuration.md
 
 ## Querying with SQL
 
-In Spark 3, tables use identifiers that include a [catalog 
name](spark-configuration.md#using-catalogs).
+In Spark, tables use identifiers that include a [catalog 
name](spark-configuration.md#using-catalogs).
 
 ```sql
 SELECT * FROM prod.db.table; -- catalog: prod, namespace: db, table: table
@@ -45,7 +45,7 @@ SELECT * FROM prod.db.table.files;
 | 0 | s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet 
| PARQUET   | 0  | {1999-01-01, 03} | 1            | 597                | [1 -> 
90, 2 -> 62] | [1 -> 1, 2 -> 1] | [1 -> 0, 2 -> 0]  | []               | [1 -> 
, 2 -> a] | [1 -> , 2 -> a] | null         | [4]           | null | null |
 
 ### Time travel Queries with SQL
-Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS 
OF` or `VERSION AS OF` clauses.
+Spark supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION 
AS OF` clauses.
 The `VERSION AS OF` clause can contain a long snapshot ID or a string branch 
or tag name.
 
 !!! info
@@ -180,10 +180,6 @@ spark.read
     .load("path/to/table")
 ```
 
-!!! info
-    Spark 3.0 and earlier versions do not support using `option` with `table` 
in DataFrameReader commands. All options will be silently
-    ignored. Do not use `table` when attempting to time-travel or use other 
options. See [SPARK-32592](https://issues.apache.org/jira/browse/SPARK-32592).
-
 ### Incremental read
 
 To read appended data incrementally, use:
diff --git a/docs/docs/spark-structured-streaming.md 
b/docs/docs/spark-structured-streaming.md
index dd569bc6a5..e722df1ea4 100644
--- a/docs/docs/spark-structured-streaming.md
+++ b/docs/docs/spark-structured-streaming.md
@@ -76,8 +76,6 @@ data.writeStream
     .toTable("database.table_name")
 ```
 
-If you're using Spark 3.0 or earlier, you need to use `.option("path", 
"database.table_name").start()`, instead of `.toTable("database.table_name")`.
-
 In the case of the directory-based Hadoop catalog:
 
 ```scala
@@ -101,7 +99,7 @@ Iceberg doesn't support experimental [continuous 
processing](https://spark.apach
 
 ### Partitioned table
 
-Iceberg requires sorting data by partition per task prior to writing the data. 
In Spark tasks are split by Spark partition.
+Iceberg requires sorting data by partition per task prior to writing the data. 
In Spark tasks are split by Spark partition
 against partitioned table. For batch queries you're encouraged to do explicit 
sort to fulfill the requirement
 (see [here](spark-writes.md#writing-distribution-modes)), but the approach 
would bring additional latency as
 repartition and sort are considered as heavy operations for streaming 
workload. To avoid additional latency, you can
diff --git a/docs/docs/spark-writes.md b/docs/docs/spark-writes.md
index 87cf6bc299..f224894a45 100644
--- a/docs/docs/spark-writes.md
+++ b/docs/docs/spark-writes.md
@@ -22,25 +22,25 @@ title: "Writes"
 
 To use Iceberg in Spark, first configure [Spark 
catalogs](spark-configuration.md).
 
-Some plans are only available when using [Iceberg SQL 
extensions](spark-configuration.md#sql-extensions) in Spark 3.
+Some plans are only available when using [Iceberg SQL 
extensions](spark-configuration.md#sql-extensions).
 
 Iceberg uses Apache Spark's DataSourceV2 API for data source and catalog 
implementations. Spark DSv2 is an evolving API with different levels of support 
in Spark versions:
 
-| Feature support                                  | Spark 3 | Notes           
                                                            |
-|--------------------------------------------------|-----------|-----------------------------------------------------------------------------|
-| [SQL insert into](#insert-into)                  | ✔️        | ⚠ Requires 
`spark.sql.storeAssignmentPolicy=ANSI` (default since Spark 3.0) |
-| [SQL merge into](#merge-into)                    | ✔️        | ⚠ Requires 
Iceberg Spark extensions                                         |
-| [SQL insert overwrite](#insert-overwrite)        | ✔️        | ⚠ Requires 
`spark.sql.storeAssignmentPolicy=ANSI` (default since Spark 3.0) |
-| [SQL delete from](#delete-from)                  | ✔️        | ⚠ Row-level 
delete requires Iceberg Spark extensions                        |
-| [SQL update](#update)                            | ✔️        | ⚠ Requires 
Iceberg Spark extensions                                         |
-| [DataFrame append](#appending-data)              | ✔️        |               
                                                              |
-| [DataFrame overwrite](#overwriting-data)         | ✔️        |               
                                                              |
-| [DataFrame CTAS and RTAS](#creating-tables)      | ✔️        | ⚠ Requires 
DSv2 API                                                         |
-| [DataFrame merge into](#merging-data)            | ✔️        | ⚠ Requires 
DSv2 API (Spark 4.0 and later)                                   |
+| Feature support                                  | Spark | Notes             
                                                          |
+|--------------------------------------------------|---------|-----------------------------------------------------------------------------|
+| [SQL insert into](#insert-into)                  | ✔️      | ⚠ Requires 
`spark.sql.storeAssignmentPolicy=ANSI` (default since Spark 3.0) |
+| [SQL merge into](#merge-into)                    | ✔️      | ⚠ Requires 
Iceberg Spark extensions                                         |
+| [SQL insert overwrite](#insert-overwrite)        | ✔️      | ⚠ Requires 
`spark.sql.storeAssignmentPolicy=ANSI` (default since Spark 3.0) |
+| [SQL delete from](#delete-from)                  | ✔️      | ⚠ Row-level 
delete requires Iceberg Spark extensions                        |
+| [SQL update](#update)                            | ✔️      | ⚠ Requires 
Iceberg Spark extensions                                         |
+| [DataFrame append](#appending-data)              | ✔️      |                 
                                                            |
+| [DataFrame overwrite](#overwriting-data)         | ✔️      |                 
                                                            |
+| [DataFrame CTAS and RTAS](#creating-tables)      | ✔️      | ⚠ Requires DSv2 
API                                                         |
+| [DataFrame merge into](#merging-data)            | ✔️      | ⚠ Requires DSv2 
API (Spark 4.0 and later)                                   |
 
 ## Writing with SQL
 
-Spark 3 supports SQL `INSERT INTO`, `MERGE INTO`, and `INSERT OVERWRITE`, as 
well as the new `DataFrameWriterV2` API.
+Spark supports SQL `INSERT INTO`, `MERGE INTO`, and `INSERT OVERWRITE`, as 
well as the new `DataFrameWriterV2` API.
 
 ### `INSERT INTO`
 
@@ -55,7 +55,7 @@ INSERT INTO prod.db.table SELECT ...
 
 ### `MERGE INTO`
 
-Spark 3 added support for `MERGE INTO` queries that can express row-level 
updates.
+Spark supports `MERGE INTO` queries that can express row-level updates.
 
 Iceberg supports `MERGE INTO` by rewriting data files that contain rows that 
need to be updated in an `overwrite` commit.
 
@@ -161,7 +161,7 @@ Note that this mode cannot replace hourly partitions like 
the dynamic example qu
 
 ### `DELETE FROM`
 
-Spark 3 added support for `DELETE FROM` queries to remove data from tables.
+Spark supports `DELETE FROM` queries to remove data from tables.
 
 Delete queries accept a filter to match rows to delete.
 
@@ -253,7 +253,7 @@ 
data.writeTo("prod.db.table.branch_audit").overwritePartitions()
 
 ## Writing with DataFrames
 
-Spark 3 introduced the new `DataFrameWriterV2` API for writing to tables using 
data frames. The v2 API is recommended for several reasons:
+Spark introduced the new `DataFrameWriterV2` API for writing to tables using 
data frames. The v2 API is recommended for several reasons:
 
 * CTAS, RTAS, and overwrite by filter are supported
 * All operations consistently write columns to a table by name
@@ -268,7 +268,7 @@ Spark 3 introduced the new `DataFrameWriterV2` API for 
writing to tables using d
 The v1 DataFrame `write` API is still supported, but is not recommended.
 
 !!! danger
-    When writing with the v1 DataFrame API in Spark 3, use `saveAsTable` or 
`insertInto` to load tables with a catalog.
+    When writing with the v1 DataFrame API in Spark, use `saveAsTable` or 
`insertInto` to load tables with a catalog.
     Using `format("iceberg")` loads an isolated table reference that will not 
automatically refresh tables used by queries.
 
 ### Appending data
diff --git a/site/docs/spark-quickstart.md b/site/docs/spark-quickstart.md
index 262a03c581..c8f9dd2b3a 100644
--- a/site/docs/spark-quickstart.md
+++ b/site/docs/spark-quickstart.md
@@ -274,7 +274,7 @@ This configuration creates a path-based catalog named 
`local` for tables under `
 === "CLI"
 
     ```sh
-    spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ 
icebergVersion }}\
+    spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-{{ 
sparkVersionMajor }}:{{ icebergVersion }}\
         --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
         --conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
         --conf spark.sql.catalog.spark_catalog.type=hive \
@@ -287,7 +287,7 @@ This configuration creates a path-based catalog named 
`local` for tables under `
 === "spark-defaults.conf"
 
     ```sh
-    spark.jars.packages                                  
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }}
+    spark.jars.packages                                  
org.apache.iceberg:iceberg-spark-runtime-{{ sparkVersionMajor }}:{{ 
icebergVersion }}
     spark.sql.extensions                                 
org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
     spark.sql.catalog.spark_catalog                      
org.apache.iceberg.spark.SparkSessionCatalog
     spark.sql.catalog.spark_catalog.type                 hive
@@ -309,19 +309,19 @@ If you already have a Spark environment, you can add 
Iceberg, using the `--packa
 === "SparkSQL"
 
     ```sh
-    spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ 
icebergVersion }}
+    spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-{{ 
sparkVersionMajor }}:{{ icebergVersion }}
     ```
 
 === "Spark-Shell"
 
     ```sh
-    spark-shell --packages 
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }}
+    spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-{{ 
sparkVersionMajor }}:{{ icebergVersion }}
     ```
 
 === "PySpark"
 
     ```sh
-    pyspark --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ 
icebergVersion }}
+    pyspark --packages org.apache.iceberg:iceberg-spark-runtime-{{ 
sparkVersionMajor }}:{{ icebergVersion }}
     ```
 
 !!! note
@@ -329,7 +329,7 @@ If you already have a Spark environment, you can add 
Iceberg, using the `--packa
     You can download the runtime by visiting to the [Releases](releases.md) 
page.
 
 <!-- markdown-link-check-disable-next-line -->
-[spark-runtime-jar]: 
https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/{{
 icebergVersion }}/iceberg-spark-runtime-3.5_2.12-{{ icebergVersion }}.jar
+[spark-runtime-jar]: 
https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-{{
 sparkVersionMajor }}/{{ icebergVersion }}/iceberg-spark-runtime-{{ 
sparkVersionMajor }}-{{ icebergVersion }}.jar
 
 #### Learn More
 
diff --git a/site/mkdocs.yml b/site/mkdocs.yml
index d4877b1ede..ebf6006b67 100644
--- a/site/mkdocs.yml
+++ b/site/mkdocs.yml
@@ -88,6 +88,7 @@ extra:
   nessieVersion: '0.104.5'
   flinkVersion: '2.0.0'
   flinkVersionMajor: '2.0'
+  sparkVersionMajor: '4.0_2.13'
   social:
     - icon: fontawesome/regular/comments
       link: 'https://iceberg.apache.org/community/'

(iceberg) branch main updated: Doc: Remove Spark 3 specific wordings in docs (#14357)

Reply via email to