(spark-kubernetes-operator) branch main updated: [SPARK-53933] Add `Apache Iceberg` example

dongjoon Fri, 17 Oct 2025 23:52:44 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git



The following commit(s) were added to refs/heads/main by this push:
     new 67da720  [SPARK-53933] Add `Apache Iceberg` example
67da720 is described below

commit 67da720eaf7df2298858747a4aa3bae87528141d
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Wed Oct 15 23:27:47 2025 -0700

    [SPARK-53933] Add `Apache Iceberg` example
    
    ### What changes were proposed in this pull request?
    
    This PR aims to add `Apache Iceberg` example.
    
    ### Why are the changes needed?
    
    To provide an working example of `Apache Spark 4.0.1` and `Apache Iceberg 
1.10.0`.
    
    1. Prepare the storage
    
    ```
    $ kubectl apply -f examples/localstack.ym
    ```
    
    2. Launch `Spark Connect Server` with `Apache Iceberg` setting.
    
    ```
    $ kubectl apply -f examples/spark-connect-server-iceberg.yaml
    ```
    
    3. Setup port-forwarding to test
    
    ```
    $ kubectl port-forward spark-connect-server-iceberg-0-driver 15002
    ```
    
    4. Test with `Apache Iceberg Spark Quickstart` guideline.
    - https://iceberg.apache.org/spark-quickstart/
    
    ```scala
    $ bin/spark-connect-shell --remote sc://localhost:15002
    
    scala> sql("""CREATE TABLE taxis(vendor_id bigint, trip_id bigint, 
trip_distance float, fare_amount double, store_and_fwd_flag string) PARTITIONED 
BY (vendor_id);""").show()
    
    scala> sql("""INSERT INTO taxis VALUES (1, 1000371, 1.8, 15.32, 'N'), (2, 
1000372, 2.5, 22.15, 'N'), (2, 1000373, 0.9, 9.01, 'N'), (1, 1000374, 8.4, 
42.13, 'Y');""").show()
    
    scala> sql("SELECT * FROM taxis").show(false)
    +---------+-------+-------------+-----------+------------------+
    |vendor_id|trip_id|trip_distance|fare_amount|store_and_fwd_flag|
    +---------+-------+-------------+-----------+------------------+
    |1        |1000374|8.4          |42.13      |Y                 |
    |1        |1000371|1.8          |15.32      |N                 |
    |2        |1000372|2.5          |22.15      |N                 |
    |2        |1000373|0.9          |9.01       |N                 |
    +---------+-------+-------------+-----------+------------------+
    
    scala> sql("SELECT * FROM taxis.history").show(false)
    +-----------------------+-------------------+---------+-------------------+
    |made_current_at        |snapshot_id        |parent_id|is_current_ancestor|
    +-----------------------+-------------------+---------+-------------------+
    |2025-10-16 03:53:04.063|6463217948421571140|NULL     |true               |
    +-----------------------+-------------------+---------+-------------------+
    
    scala> sql("SELECT * FROM taxis VERSION AS OF 
6463217948421571140").show(false)
    +---------+-------+-------------+-----------+------------------+
    |vendor_id|trip_id|trip_distance|fare_amount|store_and_fwd_flag|
    +---------+-------+-------------+-----------+------------------+
    |1        |1000374|8.4          |42.13      |Y                 |
    |1        |1000371|1.8          |15.32      |N                 |
    |2        |1000372|2.5          |22.15      |N                 |
    |2        |1000373|0.9          |9.01       |N                 |
    +---------+-------+-------------+-----------+------------------+
    ```
    
    5. Check the data in the storage.
    
    ```
    rootlocalstack:/opt/code/localstack# awslocal s3 ls s3://warehouse/ 
--recursive
    2025-10-16 03:53:03       1545 
taxis/data/vendor_id=1/00000-3-749fe2e5-bbe3-4f0e-b976-a21749550705-0-00002.parquet
    2025-10-16 03:53:03       1590 
taxis/data/vendor_id=2/00000-3-749fe2e5-bbe3-4f0e-b976-a21749550705-0-00001.parquet
    2025-10-16 03:53:04       7559 
taxis/metadata/9f629d40-ce91-4822-aeee-283d53ec5ef6-m0.avro
    2025-10-16 03:53:04       4446 
taxis/metadata/snap-6463217948421571140-1-9f629d40-ce91-4822-aeee-283d53ec5ef6.avro
    2025-10-16 03:52:47       1006 taxis/metadata/v1.metadata.json
    2025-10-16 03:53:04       1970 taxis/metadata/v2.metadata.json
    2025-10-16 03:53:04          1 taxis/metadata/version-hint.text
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Manually tested.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #394 from dongjoon-hyun/SPARK-53933.
    
    Authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 examples/localstack.yml                    |  1 +
 examples/spark-connect-server-iceberg.yaml | 45 ++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/examples/localstack.yml b/examples/localstack.yml
index 1c3088b..841ff45 100644
--- a/examples/localstack.yml
+++ b/examples/localstack.yml
@@ -41,6 +41,7 @@ spec:
             awslocal s3 mb s3://spark-events;
             awslocal s3 mb s3://ingest;
             awslocal s3 mb s3://data;
+            awslocal s3 mb s3://warehouse;
             awslocal s3 cp /opt/code/localstack/Makefile s3://data/
 ---
 apiVersion: v1
diff --git a/examples/spark-connect-server-iceberg.yaml 
b/examples/spark-connect-server-iceberg.yaml
new file mode 100644
index 0000000..6f54c63
--- /dev/null
+++ b/examples/spark-connect-server-iceberg.yaml
@@ -0,0 +1,45 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+apiVersion: spark.apache.org/v1
+kind: SparkApplication
+metadata:
+  name: spark-connect-server-iceberg
+spec:
+  mainClass: "org.apache.spark.sql.connect.service.SparkConnectServer"
+  sparkConf:
+    spark.dynamicAllocation.enabled: "true"
+    spark.dynamicAllocation.maxExecutors: "3"
+    spark.dynamicAllocation.minExecutors: "3"
+    spark.dynamicAllocation.shuffleTracking.enabled: "true"
+    spark.hadoop.fs.s3a.access.key: "test"
+    spark.hadoop.fs.s3a.endpoint: "http://localstack:4566";
+    spark.hadoop.fs.s3a.path.style.access: "true"
+    spark.hadoop.fs.s3a.secret.key: "test"
+    spark.jars.ivy: "/tmp/.ivy2.5.2"
+    spark.jars.packages: 
"org.apache.hadoop:hadoop-aws:3.4.1,org.apache.iceberg:iceberg-spark-runtime-4.0_2.13:1.10.0"
+    spark.kubernetes.authenticate.driver.serviceAccountName: "spark"
+    spark.kubernetes.container.image: "apache/spark:4.0.1"
+    spark.kubernetes.driver.pod.excludedFeatureSteps: 
"org.apache.spark.deploy.k8s.features.KerberosConfDriverFeatureStep"
+    spark.kubernetes.executor.podNamePrefix: "spark-connect-server-iceberg"
+    spark.scheduler.mode: "FAIR"
+    spark.sql.catalog.s3.type: "hadoop"
+    spark.sql.catalog.s3.warehouse: "s3a://warehouse"
+    spark.sql.catalog.s3: "org.apache.iceberg.spark.SparkCatalog"
+    spark.sql.defaultCatalog: "s3"
+    spark.sql.extensions: 
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"
+  applicationTolerations:
+    resourceRetainPolicy: OnFailure
+  runtimeVersions:
+    sparkVersion: "4.0.1"


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark-kubernetes-operator) branch main updated: [SPARK-53933] Add `Apache Iceberg` example

Reply via email to