[GitHub] [iceberg] Cqz666 opened a new issue, #5384: Exception occurred while writing to Iceberg tables by 'INSERT OVERWRITE'

GitBox Thu, 28 Jul 2022 20:49:55 -0700


Cqz666 opened a new issue, #5384:
URL: https://github.com/apache/iceberg/issues/5384


   I used ’ INSERT OVERWRITE’ to writing to Iceberg tables, but Spark ran with 
the following exception:
   `Caused by: java.lang.IllegalStateException: Incoming records violate the 
writer assumption that records are clustered by spec and by partition within 
each spec. Either cluster the incoming records or switch to fanout writers.
   Encountered records that belong to already closed files:
   partition 'datekey=20220727/event=search_show' in spec [
     1000: datekey: identity(1)
     1001: event: identity(2)
   ]
        at org.apache.iceberg.io.ClusteredWriter.write(ClusteredWriter.java:95)
        at 
org.apache.iceberg.io.ClusteredDataWriter.write(ClusteredDataWriter.java:34)
        at 
org.apache.iceberg.spark.source.SparkWrite$PartitionedDataWriter.write(SparkWrite.java:641)
        at 
org.apache.iceberg.spark.source.SparkWrite$PartitionedDataWriter.write(SparkWrite.java:616)
        at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$7(WriteToDataSourceV2Exec.scala:441)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)
        at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:477)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:385)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:127)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   
   `
   The code looks something like this:
   `    spark.sql(
         s"""
            | INSERT OVERWRITE
            | `$icebergCatalog`.`$icebergDatabase`.`$icebergTable`
            | SELECT $selectSql
            | FROM $hiveDatabase.$topic
            | WHERE datekey>=$lowerDate AND datekey<=$upperDate AND event in 
($eventStr)
            |""".stripMargin
       )`
   
   The data source is a partition table of Hive.
   
   How do I correctly write iceberg partition tables by `INSERT OVERWRITE` or 
something else?
   
   Iceberg version: 0.13.2
   spark version: 3.0.1
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] Cqz666 opened a new issue, #5384: Exception occurred while writing to Iceberg tables by 'INSERT OVERWRITE'

Reply via email to