ayush-san opened a new issue #2734:
URL: https://github.com/apache/iceberg/issues/2734
Hi,
While running `MERGE INTO`, I am getting the following exception
```
java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of
partitions: List(1, 0)
at
org.apache.spark.rdd.ZippedPartitionsBaseRDD.getPartitions(ZippedPartitionsRDD.scala:58)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
at
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:366)
at
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:361)
at
org.apache.spark.sql.execution.datasources.v2.ReplaceDataExec.writeWithV2(ReplaceDataExec.scala:26)
at
org.apache.spark.sql.execution.datasources.v2.ReplaceDataExec.run(ReplaceDataExec.scala:31)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:45)
at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:230)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3667)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
at
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:132)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:132)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:248)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:131)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3665)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:230)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:101)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:98)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
```
My query
```
MERGE INTO db_name.target_iceberg_table target
USING (SOME QUERY ON ICEBERG TABLE) temp
ON t.Id = s.Id WHEN MATCHED THEN
UPDATE SET .....
WHEN NOT MATCHED THEN INSERT *
```
Both source and target tables are iceberg tables, temp table has increment
data between two runs. I am running this for iceberg 0.11.0 version
My spark conf
```
--conf spark.sql.orc.impl=native
--conf spark.sql.orc.enableVectorizedReader=true
--conf spark.sql.hive.convertMetastoreOrc=true
--conf spark.shuffle.blockTransferService='nio'
--conf spark.executor.defaultJavaOptions='-XX:+UseG1GC'
--conf spark.hadoop.orc.overwrite.output.file=true
--conf spark.driver.defaultJavaOptions='-XX:+UseG1GC'
--conf spark.yarn.maxAppAttempts=1
--conf
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
--conf
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
--conf spark.sql.catalog.spark_catalog.type=hive
--conf spark.sql.catalog.spark_catalog.uri=thrift://METASTORE:9083
--conf spark.sql.catalog.hive=org.apache.iceberg.spark.SparkCatalog
--conf spark.sql.catalog.hive.type=hive
--conf spark.sql.catalog.hive.uri=thrift://METASTORE:9083
--conf spark.sql.broadcastTimeout=1500
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]