Shekkylar opened a new issue, #10675:
URL: https://github.com/apache/hudi/issues/10675
## Issue Summary
Encountering challenges while integrating the Hudi Spark Connector with
Golang. Insert, update, and upsert queries are resulting in errors, while
create table and select queries work without issues.
## Environment
- Java 8
- EMR cluster emr-version-7.0
- Spark version 3.5.0
- Spark Connector server started on port 15002
- Golang-v1.21.7 used to connect to Spark locally via SSH tunneling
- Glue meta store for catalog
### Start Spark Server Command
Executed the following command in AWS EMR CLI within Spark:
```bash
cd /usr/lib/spark
./sbin/start-connect-server.sh \
--packages org.apache.spark:spark-connect_2.12:3.5.0 \
--jars /usr/lib/hudi/hudi-spark-bundle.jar
--conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \
--conf
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
\
--conf
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension" \
--conf
"spark.sql.catalog.aws.glue.sync.tool.classes=org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool"
# Golang Code Snippet:
package main
import (
"fmt"
"log"
"github.com/apache/spark-connect-go/v34/client/sql"
)
func main() {
remote := "sc://localhost:8157"
spark, err := sql.SparkSession.Builder.Remote(remote).Build()
if err != nil {
fmt.Println(err)
log.Fatal("Failed to connect to Spark:", err)
}
// Example SQL query to show all tables
query := "SHOW TABLES"
alltab, err := spark.Sql(query)
if err != nil {
log.Fatal("Failed to execute SQL query:", err)
}
// Show the result
alltab.Show(10, true)
//Create the Hudi table with the basic schema
_, err = spark.Sql(`create table hudi_table (
id bigint,
name string,
dt string
) using hudi
LOCATION "s3://spark-hudi-table/output/"
TBLPROPERTIES (
type = "cow",
primaryKey = "id"
)
partitioned by (dt);`)
if err != nil {
fmt.Println("failed to Create", err)
}
//Insert data into the Hudi table
_, err = spark.Sql(`insert into default.hudi_table (id, name, dt)
VALUES (1, 'test 1', '2023-11-11'), (2, 'test 2', '2023-11-12');`)
if err != nil {
fmt.Println("Failed to insert data into Hudi table:", err)
}
//Query the Hudi table
result, err := spark.Sql("SELECT * FROM hudi_table")
if err != nil {
fmt.Println("Failed to query Hudi table:", err)
}
// // Show the result
result.Show(10, true)
// Stop the Spark session
spark.Stop()
}
# Issue Details:
While executing insert query, the Spark job fails with the following error
taken fron Spark connector logs:
#encountered errors:
```
24/02/15 12:43:14 INFO Javalin: Starting Javalin ...
24/02/15 12:43:14 INFO Javalin: You are running Javalin 4.6.7 (released
October 24, 2022. Your Javalin version is 479 days old. Consider checking for a
newer version.).
24/02/15 12:43:14 INFO Javalin: Listening on http://localhost:39459/
24/02/15 12:43:14 INFO Javalin: Javalin started in 151ms \o/
24/02/15 12:43:14 INFO CodeGenerator: Code generated in 14.79973 ms
24/02/15 12:43:14 INFO S3NativeFileSystem: Opening
's3://spark-hudi-table/output/.hoodie/hoodie.properties' for reading
24/02/15 12:43:14 INFO S3NativeFileSystem: Opening
's3://spark-hudi-table/output/.hoodie/hoodie.properties' for reading
24/02/15 12:43:14 INFO S3NativeFileSystem: Opening
's3://spark-hudi-table/output/.hoodie/hoodie.properties' for reading
24/02/15 12:43:15 INFO S3NativeFileSystem: Opening
's3://spark-hudi-table/output/.hoodie/hoodie.properties' for reading
24/02/15 12:43:15 INFO MultipartUploadOutputStream: close closed:false
s3://spark-hudi-table/output/.hoodie/20240215124314041.commit.requested
24/02/15 12:43:15 INFO S3NativeFileSystem: Opening
's3://spark-hudi-table/output/.hoodie/hoodie.properties' for reading
24/02/15 12:43:15 INFO S3NativeFileSystem: Opening
's3://spark-hudi-table/output/.hoodie/hoodie.properties' for reading
24/02/15 12:43:16 INFO MultipartUploadOutputStream: close closed:false
s3://spark-hudi-table/output/.hoodie/metadata/.hoodie/hoodie.properties
24/02/15 12:43:17 INFO S3NativeFileSystem: Opening
's3://spark-hudi-table/output/.hoodie/metadata/.hoodie/hoodie.properties' for
reading
24/02/15 12:43:17 INFO SparkContext: Starting job: Spark Connect -
session_id: "66f20158-e2df-4941-b6f4-4565c534143b"
user_context {
user_id: "na"
}
plan {
command {
sql_command {
...
24/02/15 12:43:17 INFO DAGScheduler: Got job 0 (Spark Connect - session_id:
"66f20158-e2df-4941-b6f4-4565c534143b"
user_context {
user_id: "na"
}
plan {
command {
sql_command {
...) with 1 output partitions
24/02/15 12:43:17 INFO DAGScheduler: Final stage: ResultStage 0 (Spark
Connect - session_id: "66f20158-e2df-4941-b6f4-4565c534143b"
user_context {
user_id: "na"
}
plan {
command {
sql_command {
...)
24/02/15 12:43:17 INFO DAGScheduler: Parents of final stage: List()
24/02/15 12:43:17 INFO DAGScheduler: Missing parents: List()
24/02/15 12:43:17 INFO DAGScheduler: Submitting ResultStage 0
(MapPartitionsRDD[7] at Spark Connect - session_id:
"66f20158-e2df-4941-b6f4-4565c534143b"
user_context {
user_id: "na"
}
plan {
command {
sql_command {
...), which has no missing parents
24/02/15 12:43:17 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 112.6 KiB, free 1048.7 MiB)
24/02/15 12:43:17 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes
in memory (estimated size 42.1 KiB, free 1048.6 MiB)
24/02/15 12:43:17 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory
on ip-172-31-46-34.ap-south-1.compute.internal:42365 (size: 42.1 KiB, free:
1048.8 MiB)
24/02/15 12:43:17 INFO SparkContext: Created broadcast 0 from broadcast at
DAGScheduler.scala:1656
24/02/15 12:43:17 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 0 (MapPartitionsRDD[7] at Spark Connect - session_id:
"66f20158-e2df-4941-b6f4-4565c534143b"
user_context {
user_id: "na"
}
plan {
command {
sql_command {
...) (first 15 tasks are for partitions Vector(0))
24/02/15 12:43:17 INFO YarnScheduler: Adding task set 0.0 with 1 tasks
resource profile 0
24/02/15 12:43:18 INFO ExecutorAllocationManager: Requesting 1 new executor
because tasks are backlogged (new desired total will be 1 for resource profile
id: 0)
24/02/15 12:43:22 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered
executor NettyRpcEndpointRef(spark-client://Executor) (172.31.40.114:55534)
with ID 2, ResourceProfileId 0
24/02/15 12:43:22 INFO ExecutorMonitor: New executor 2 has registered (new
total is 1)
24/02/15 12:43:22 INFO BlockManagerMasterEndpoint: Registering block manager
ip-172-31-40-114.ap-south-1.compute.internal:45125 with 2.3 GiB RAM,
BlockManagerId(2, ip-172-31-40-114.ap-south-1.compute.internal, 45125, None)
24/02/15 12:43:22 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID
0) (ip-172-31-40-114.ap-south-1.compute.internal, executor 2, partition 0,
PROCESS_LOCAL, 7863 bytes)
24/02/15 12:43:23 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory
on ip-172-31-40-114.ap-south-1.compute.internal:45125 (size: 42.1 KiB, free:
2.3 GiB)
24/02/15 12:43:24 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0)
(ip-172-31-40-114.ap-south-1.compute.internal executor 2):
java.lang.ClassCastException: class org.apache.hudi.hadoop.SerializablePath
cannot be cast to class org.apache.hudi.hadoop.SerializablePath
(org.apache.hudi.hadoop.SerializablePath is in unnamed module of loader
org.apache.spark.util.MutableURLClassLoader @5b675711;
org.apache.hudi.hadoop.SerializablePath is in unnamed module of loader
org.apache.spark.util.ChildFirstURLClassLoader @1875dd5a)
at
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
at scala.collection.AbstractIterator.to(Iterator.scala:1431)
at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1046)
at
org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2446)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:143)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:629)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:632)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
24/02/15 12:43:24 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID
1) (ip-172-31-40-114.ap-south-1.compute.internal, executor 2, partition 0,
PROCESS_LOCAL, 7863 bytes)
24/02/15 12:43:24 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) on
ip-172-31-40-114.ap-south-1.compute.internal, executor 2:
java.lang.ClassCastException (class org.apache.hudi.hadoop.SerializablePath
cannot be cast to class org.apache.hudi.hadoop.SerializablePath
(org.apache.hudi.hadoop.SerializablePath is in unnamed module of loader
org.apache.spark.util.MutableURLClassLoader @5b675711;
org.apache.hudi.hadoop.SerializablePath is in unnamed module of loader
org.apache.spark.util.ChildFirstURLClassLoader @1875dd5a)) [duplicate 1]
24/02/15 12:43:24 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID
2) (ip-172-31-40-114.ap-south-1.compute.internal, executor 2, partition 0,
PROCESS_LOCAL, 7863 bytes)
24/02/15 12:43:24 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 2) on
ip-172-31-40-114.ap-south-1.compute.internal, executor 2:
java.lang.ClassCastException (class org.apache.hudi.hadoop.SerializablePath
cannot be cast to class org.apache.hudi.hadoop.SerializablePath
(org.apache.hudi.hadoop.SerializablePath is in unnamed module of loader
org.apache.spark.util.MutableURLClassLoader @5b675711;
org.apache.hudi.hadoop.SerializablePath is in unnamed module of loader
org.apache.spark.util.ChildFirstURLClassLoader @1875dd5a)) [duplicate 2]
24/02/15 12:43:24 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID
3) (ip-172-31-40-114.ap-south-1.compute.internal, executor 2, partition 0,
PROCESS_LOCAL, 7863 bytes)
24/02/15 12:43:24 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 3) on
ip-172-31-40-114.ap-south-1.compute.internal, executor 2:
java.lang.ClassCastException (class org.apache.hudi.hadoop.SerializablePath
cannot be cast to class org.apache.hudi.hadoop.SerializablePath
(org.apache.hudi.hadoop.SerializablePath is in unnamed module of loader
org.apache.spark.util.MutableURLClassLoader @5b675711;
org.apache.hudi.hadoop.SerializablePath is in unnamed module of loader
org.apache.spark.util.ChildFirstURLClassLoader @1875dd5a)) [duplicate 3]
24/02/15 12:43:24 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times;
aborting job
24/02/15 12:43:24 INFO YarnScheduler: Removed TaskSet 0.0, whose tasks have
all completed, from pool
24/02/15 12:43:24 INFO YarnScheduler: Cancelling stage 0
24/02/15 12:43:24 INFO YarnScheduler: Killing all running tasks in stage 0:
Stage cancelled: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4
times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3)
(ip-172-31-40-114.ap-south-1.compute.internal executor 2):
java.lang.ClassCastException: class org.apache.hudi.hadoop.SerializablePath
cannot be cast to class org.apache.hudi.hadoop.SerializablePath
(org.apache.hudi.hadoop.SerializablePath is in unnamed module of loader
org.apache.spark.util.MutableURLClassLoader @5b675711;
org.apache.hudi.hadoop.SerializablePath is in unnamed module of loader
org.apache.spark.util.ChildFirstURLClassLoader @1875dd5a)
at
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
at scala.collection.AbstractIterator.to(Iterator.scala:1431)
at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1046)
at
org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2446)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:143)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:629)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:632)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Driver stacktrace:
24/02/15 12:43:24 INFO DAGScheduler: ResultStage 0 (Spark Connect -
session_id: "66f20158-e2df-4941-b6f4-4565c534143b"
user_context {
user_id: "na"
}
plan {
command {
sql_command {
...) failed in 6.879 s due to Job aborted due to stage failure: Task 0
in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0
(TID 3) (ip-172-31-40-114.ap-south-1.compute.internal executor 2):
java.lang.ClassCastException: class org.apache.hudi.hadoop.SerializablePath
cannot be cast to class org.apache.hudi.hadoop.SerializablePath
(org.apache.hudi.hadoop.SerializablePath is in unnamed module of loader
org.apache.spark.util.MutableURLClassLoader @5b675711;
org.apache.hudi.hadoop.SerializablePath is in unnamed module of loader
org.apache.spark.util.ChildFirstURLClassLoader @1875dd5a)
at
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
at scala.collection.AbstractIterator.to(Iterator.scala:1431)
at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1046)
at
org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2446)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:143)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:629)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:632)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Driver stacktrace:
24/02/15 12:43:24 INFO DAGScheduler: Job 0 failed: Spark Connect -
session_id: "66f20158-e2df-4941-b6f4-4565c534143b"
user_context {
user_id: "na"
}
plan {
command {
sql_command {
..., took 6.932404 s
24/02/15 12:43:24 INFO DAGScheduler: Asked to cancel jobs with tag
SparkConnect_OperationTag_User_66f20158-e2df-4941-b6f4-4565c534143b_Session_na_Operation_28cf34dd-febc-4944-b727-e6dff0849635
24/02/15 12:43:24 INFO ErrorUtils: Spark Connect error during: execute.
UserId: na. SessionId: 66f20158-e2df-4941-b6f4-4565c534143b.
org.apache.hudi.exception.HoodieException: Failed to instantiate Metadata
table
at
org.apache.hudi.client.SparkRDDWriteClient.initializeMetadataTable(SparkRDDWriteClient.java:293)
~[hudi-spark3.5-bundle_2.12-0.14.0-amzn-1.jar:0.14.0-amzn-1]
at
org.apache.hudi.client.SparkRDDWriteClient.initMetadataTable(SparkRDDWriteClient.java:273)
~[hudi-spark3.5-bundle_2.12-0.14.0-amzn-1.jar:0.14.0-amzn-1]
at
org.apache.hudi.client.BaseHoodieWriteClient.doInitTable(BaseHoodieWriteClient.java:1263)
~[hudi-spark3.5-bundle_2.12-0.14.0-amzn-1.jar:0.14.0-amzn-1]
at
org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1303)
~[hudi-spark3.5-bundle_2.12-0.14.0-amzn-1.jar:0.14.0-amzn-1]
at
org.apache.hudi.client.SparkRDDWriteClient.insert(SparkRDDWriteClient.java:164)
~[hudi-spark3.5-bundle_2.12-0.14.0-amzn-1.jar:0.14.0-amzn-1]
at
org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:218)
~[hudi-spark3.5-bundle_2.12-0.14.0-amzn-1.jar:0.14.0-amzn-1]
at
org.apache.hudi.HoodieSparkSqlWriter$.writeInternal(HoodieSparkSqlWriter.scala:431)
~[hudi-spark3.5-bundle_2.12-0.14.0-amzn-1.jar:0.14.0-amzn-1]
at
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:132)
~[hudi-spark3.5-bundle_2.12-0.14.0-amzn-1.jar:0.14.0-amzn-1]
at
org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:108)
~[hudi-spark3.5-bundle_2.12-0.14.0-amzn-1.jar:0.14.0-amzn-1]
at
org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand.run(InsertIntoHoodieTableCommand.scala:61)
~[hudi-spark3.5-bundle_2.12-0.14.0-amzn-1.jar:0.14.0-amzn-1]
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:113)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108)
~[spark-catalyst_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:255)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:129)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:165)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108)
~[spark-catalyst_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:255)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$8(SQLExecution.scala:165)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:276)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:164)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:70)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:101)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:503)
~[spark-catalyst_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
~[spark-sql-api_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:503)
~[spark-catalyst_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:33)
~[spark-catalyst_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
~[spark-catalyst_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
~[spark-catalyst_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:33)
~[spark-catalyst_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:33)
~[spark-catalyst_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:479)
~[spark-catalyst_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:101)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:88)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:86)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:222)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:102)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:99)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.connect.planner.SparkConnectPlanner.handleSqlCommand(SparkConnectPlanner.scala:2461)
~[org.apache.spark_spark-connect_2.12-3.5.0.jar:3.5.0]
at
org.apache.spark.sql.connect.planner.SparkConnectPlanner.process(SparkConnectPlanner.scala:2426)
~[org.apache.spark_spark-connect_2.12-3.5.0.jar:3.5.0]
at
org.apache.spark.sql.connect.execution.ExecuteThreadRunner.handleCommand(ExecuteThreadRunner.scala:202)
~[org.apache.spark_spark-connect_2.12-3.5.0.jar:3.5.0]
at
org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1(ExecuteThreadRunner.scala:158)
~[org.apache.spark_spark-connect_2.12-3.5.0.jar:3.5.0]
at
org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1$adapted(ExecuteThreadRunner.scala:132)
~[org.apache.spark_spark-connect_2.12-3.5.0.jar:3.5.0]
at
org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$2(SessionHolder.scala:189)
~[org.apache.spark_spark-connect_2.12-3.5.0.jar:3.5.0]
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
~[spark-sql_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$1(SessionHolder.scala:189)
~[org.apache.spark_spark-connect_2.12-3.5.0.jar:3.5.0]
at
org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withContextClassLoader$1(SessionHolder.scala:176)
~[org.apache.spark_spark-connect_2.12-3.5.0.jar:3.5.0]
at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:179)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.sql.connect.service.SessionHolder.withContextClassLoader(SessionHolder.scala:175)
~[org.apache.spark_spark-connect_2.12-3.5.0.jar:3.5.0]
at
org.apache.spark.sql.connect.service.SessionHolder.withSession(SessionHolder.scala:188)
~[org.apache.spark_spark-connect_2.12-3.5.0.jar:3.5.0]
at
org.apache.spark.sql.connect.execution.ExecuteThreadRunner.executeInternal(ExecuteThreadRunner.scala:132)
~[org.apache.spark_spark-connect_2.12-3.5.0.jar:3.5.0]
at
org.apache.spark.sql.connect.execution.ExecuteThreadRunner.org$apache$spark$sql$connect$execution$ExecuteThreadRunner$$execute(ExecuteThreadRunner.scala:84)
[org.apache.spark_spark-connect_2.12-3.5.0.jar:3.5.0]
at
org.apache.spark.sql.connect.execution.ExecuteThreadRunner$ExecutionThread.run(ExecuteThreadRunner.scala:228)
[org.apache.spark_spark-connect_2.12-3.5.0.jar:3.5.0]
Caused by: org.apache.spark.SparkException: Job aborted due to stage
failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3
in stage 0.0 (TID 3) (ip-172-31-40-114.ap-south-1.compute.internal executor 2):
java.lang.ClassCastException: class org.apache.hudi.hadoop.SerializablePath
cannot be cast to class org.apache.hudi.hadoop.SerializablePath
(org.apache.hudi.hadoop.SerializablePath is in unnamed module of loader
org.apache.spark.util.MutableURLClassLoader @5b675711;
org.apache.hudi.hadoop.SerializablePath is in unnamed module of loader
org.apache.spark.util.ChildFirstURLClassLoader @1875dd5a)
at
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
at scala.collection.AbstractIterator.to(Iterator.scala:1431)
at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1046)
at
org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2446)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:143)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:629)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:632)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3067)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3003)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3002)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
~[scala-library-2.12.17.jar:?]
at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
~[scala-library-2.12.17.jar:?]
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
~[scala-library-2.12.17.jar:?]
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3002)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1318)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1318)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at scala.Option.foreach(Option.scala:407) ~[scala-library-2.12.17.jar:?]
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1318)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3271)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3205)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3194)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1041)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2406)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2427)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2446)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2471)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1046)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.rdd.RDD.withScope(RDD.scala:407)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.rdd.RDD.collect(RDD.scala:1045)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:362)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:361)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.hudi.client.common.HoodieSparkEngineContext.map(HoodieSparkEngineContext.java:116)
~[hudi-spark3.5-bundle_2.12-0.14.0-amzn-1.jar:0.14.0-amzn-1]
at
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.listAllPartitionsFromFilesystem(HoodieBackedTableMetadataWriter.java:654)
~[hudi-spark3.5-bundle_2.12-0.14.0-amzn-1.jar:0.14.0-amzn-1]
at
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.initializeFromFilesystem(HoodieBackedTableMetadataWriter.java:388)
~[hudi-spark3.5-bundle_2.12-0.14.0-amzn-1.jar:0.14.0-amzn-1]
at
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.initializeIfNeeded(HoodieBackedTableMetadataWriter.java:278)
~[hudi-spark3.5-bundle_2.12-0.14.0-amzn-1.jar:0.14.0-amzn-1]
at
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.<init>(HoodieBackedTableMetadataWriter.java:182)
~[hudi-spark3.5-bundle_2.12-0.14.0-amzn-1.jar:0.14.0-amzn-1]
at
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.<init>(SparkHoodieBackedTableMetadataWriter.java:95)
~[hudi-spark3.5-bundle_2.12-0.14.0-amzn-1.jar:0.14.0-amzn-1]
at
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.create(SparkHoodieBackedTableMetadataWriter.java:72)
~[hudi-spark3.5-bundle_2.12-0.14.0-amzn-1.jar:0.14.0-amzn-1]
at
org.apache.hudi.client.SparkRDDWriteClient.initializeMetadataTable(SparkRDDWriteClient.java:287)
~[hudi-spark3.5-bundle_2.12-0.14.0-amzn-1.jar:0.14.0-amzn-1]
... 61 more
Caused by: java.lang.ClassCastException: class
org.apache.hudi.hadoop.SerializablePath cannot be cast to class
org.apache.hudi.hadoop.SerializablePath
(org.apache.hudi.hadoop.SerializablePath is in unnamed module of loader
org.apache.spark.util.MutableURLClassLoader @5b675711;
org.apache.hudi.hadoop.SerializablePath is in unnamed module of loader
org.apache.spark.util.ChildFirstURLClassLoader @1875dd5a)
at
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
~[scala-library-2.12.17.jar:?]
at scala.collection.Iterator.foreach(Iterator.scala:943)
~[scala-library-2.12.17.jar:?]
at scala.collection.Iterator.foreach$(Iterator.scala:943)
~[scala-library-2.12.17.jar:?]
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
~[scala-library-2.12.17.jar:?]
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
~[scala-library-2.12.17.jar:?]
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
~[scala-library-2.12.17.jar:?]
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
~[scala-library-2.12.17.jar:?]
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
~[scala-library-2.12.17.jar:?]
at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
~[scala-library-2.12.17.jar:?]
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
~[scala-library-2.12.17.jar:?]
at scala.collection.AbstractIterator.to(Iterator.scala:1431)
~[scala-library-2.12.17.jar:?]
at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
~[scala-library-2.12.17.jar:?]
at
scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
~[scala-library-2.12.17.jar:?]
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
~[scala-library-2.12.17.jar:?]
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
~[scala-library-2.12.17.jar:?]
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
~[scala-library-2.12.17.jar:?]
at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
~[scala-library-2.12.17.jar:?]
at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1046)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2446)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.scheduler.Task.run(Task.scala:143)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:629)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
~[spark-common-utils_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
~[spark-common-utils_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:632)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
~[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
~[?:?]
at java.lang.Thread.run(Thread.java:840) ~[?:?]
24/02/15 12:43:24 INFO ExecuteGrpcResponseSender: Stream finished for
opId=28cf34dd-febc-4944-b727-e6dff0849635, sent all responses up to last index
0. totalTime=10955578375ns waitingForResults=10955491381ns waitingForSend=0ns
24/02/15 12:43:24 INFO SparkConnectExecutionManager: ExecuteHolder
ExecuteKey(na,66f20158-e2df-4941-b6f4-4565c534143b,28cf34dd-febc-4944-b727-e6dff0849635)
is removed.
24/02/15 12:43:24 INFO ExecuteResponseObserver: Release all for
opId=28cf34dd-febc-4944-b727-e6dff0849635. Execution stats:
total=CachedSize(0,0) autoRemoved=CachedSize(0,0)
cachedUntilConsumed=CachedSize(0,0) cachedUntilProduced=CachedSize(0,0)
maxCachedUntilConsumed=CachedSize(0,0) maxCachedUntilProduced=CachedSize(0,0)
24/02/15 12:43:24 INFO SparkConnectExecutionManager: ExecuteHolder
ExecuteKey(na,66f20158-e2df-4941-b6f4-4565c534143b,c3ec3331-c542-4541-8d73-1d71d3c7e3a3)
is created.
24/02/15 12:43:24 INFO ExecuteGrpcResponseSender: Starting for
opId=c3ec3331-c542-4541-8d73-1d71d3c7e3a3, reattachable=false,
lastConsumedStreamIndex=0
24/02/15 12:43:24 INFO ExecuteGrpcResponseSender: Stream finished for
opId=c3ec3331-c542-4541-8d73-1d71d3c7e3a3, sent all responses up to last index
2. totalTime=160844370ns waitingForResults=158793210ns waitingForSend=0ns
24/02/15 12:43:24 INFO SparkConnectExecutionManager: ExecuteHolder
ExecuteKey(na,66f20158-e2df-4941-b6f4-4565c534143b,c3ec3331-c542-4541-8d73-1d71d3c7e3a3)
is removed.
24/02/15 12:43:24 INFO ExecuteResponseObserver: Release all for
opId=c3ec3331-c542-4541-8d73-1d71d3c7e3a3. Execution stats:
total=CachedSize(424,2) autoRemoved=CachedSize(147,1)
cachedUntilConsumed=CachedSize(0,0) cachedUntilProduced=CachedSize(0,0)
maxCachedUntilConsumed=CachedSize(424,2)
maxCachedUntilProduced=CachedSize(424,2)
24/02/15 12:43:24 INFO SparkConnectExecutionManager: ExecuteHolder
ExecuteKey(na,66f20158-e2df-4941-b6f4-4565c534143b,82079c9e-3ef5-485c-9b0c-2fe71d117eb6)
is created.
24/02/15 12:43:24 INFO ExecuteGrpcResponseSender: Starting for
opId=82079c9e-3ef5-485c-9b0c-2fe71d117eb6, reattachable=false,
lastConsumedStreamIndex=0
24/02/15 12:43:24 INFO CodeGenerator: Code generated in 42.971768 ms
24/02/15 12:43:24 INFO ExecuteGrpcResponseSender: Stream finished for
opId=82079c9e-3ef5-485c-9b0c-2fe71d117eb6, sent all responses up to last index
3. totalTime=404638217ns waitingForResults=403497209ns waitingForSend=0ns
24/02/15 12:43:24 INFO SparkConnectExecutionManager: ExecuteHolder
ExecuteKey(na,66f20158-e2df-4941-b6f4-4565c534143b,82079c9e-3ef5-485c-9b0c-2fe71d117eb6)
is removed.
24/02/15 12:43:24 INFO ExecuteResponseObserver: Release all for
opId=82079c9e-3ef5-485c-9b0c-2fe71d117eb6. Execution stats:
total=CachedSize(1243,3) autoRemoved=CachedSize(1060,2)
cachedUntilConsumed=CachedSize(0,0) cachedUntilProduced=CachedSize(0,0)
maxCachedUntilConsumed=CachedSize(1105,2)
maxCachedUntilProduced=CachedSize(1105,2)
24/02/15 12:43:36 INFO SparkConnectExecutionManager: Started periodic run of
SparkConnectExecutionManager maintenance.
24/02/15 12:43:36 INFO SparkConnectExecutionManager: Finished periodic run
of SparkConnectExecutionManager maintenance.
24/02/15 12:44:06 INFO SparkConnectExecutionManager: Started periodic run of
SparkConnectExecutionManager maintenance.
24/02/15 12:44:06 INFO SparkConnectExecutionManager: Finished periodic run
of SparkConnectExecutionManager maintenance.
...
# Request for Assistance:
I kindly request assistance from the community to help identify and resolve
the issue. The create table and select queries work as expected, but there
seems to be a problem with insert, update, and upsert queries.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]