ayush-san opened a new issue #2734:
URL: https://github.com/apache/iceberg/issues/2734


   Hi,
   
   While running `MERGE INTO`, I am getting the following exception
   ```
   java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of 
partitions: List(1, 0)
     at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD.getPartitions(ZippedPartitionsRDD.scala:58)
     at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
     at scala.Option.getOrElse(Option.scala:189)
     at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
     at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
     at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
     at scala.Option.getOrElse(Option.scala:189)
     at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
     at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:366)
     at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:361)
     at 
org.apache.spark.sql.execution.datasources.v2.ReplaceDataExec.writeWithV2(ReplaceDataExec.scala:26)
     at 
org.apache.spark.sql.execution.datasources.v2.ReplaceDataExec.run(ReplaceDataExec.scala:31)
     at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39)
     at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39)
     at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:45)
     at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:230)
     at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3667)
     at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
     at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
     at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:107)
     at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:132)
     at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
     at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
     at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:132)
     at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:248)
     at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:131)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
     at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
     at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3665)
     at org.apache.spark.sql.Dataset.<init>(Dataset.scala:230)
     at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:101)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
     at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:98)
     at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
     at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
   ```
   
   My query 
   
   ```
   MERGE INTO db_name.target_iceberg_table target
   USING (SOME QUERY ON ICEBERG TABLE) temp
   ON t.Id = s.Id WHEN MATCHED THEN
   UPDATE SET .....
   WHEN NOT MATCHED THEN INSERT *
   ```
   
   Both source and target tables are iceberg tables, temp table has increment 
data between two runs. I am running this for iceberg 0.11.0 version
   
   My spark conf 
   ```
   --conf spark.sql.orc.impl=native
   --conf spark.sql.orc.enableVectorizedReader=true 
   --conf spark.sql.hive.convertMetastoreOrc=true
   --conf spark.shuffle.blockTransferService='nio'
   --conf spark.executor.defaultJavaOptions='-XX:+UseG1GC'
   --conf spark.hadoop.orc.overwrite.output.file=true
   --conf spark.driver.defaultJavaOptions='-XX:+UseG1GC'
   --conf spark.yarn.maxAppAttempts=1 
   --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 
   --conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
   --conf spark.sql.catalog.spark_catalog.type=hive 
   --conf spark.sql.catalog.spark_catalog.uri=thrift://METASTORE:9083 
   --conf spark.sql.catalog.hive=org.apache.iceberg.spark.SparkCatalog 
   --conf spark.sql.catalog.hive.type=hive 
   --conf spark.sql.catalog.hive.uri=thrift://METASTORE:9083 
   --conf spark.sql.broadcastTimeout=1500 
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to