Cqz666 opened a new issue, #5384:
URL: https://github.com/apache/iceberg/issues/5384
I used ’ INSERT OVERWRITE’ to writing to Iceberg tables, but Spark ran with
the following exception:
`Caused by: java.lang.IllegalStateException: Incoming records violate the
writer assumption that records are clustered by spec and by partition within
each spec. Either cluster the incoming records or switch to fanout writers.
Encountered records that belong to already closed files:
partition 'datekey=20220727/event=search_show' in spec [
1000: datekey: identity(1)
1001: event: identity(2)
]
at org.apache.iceberg.io.ClusteredWriter.write(ClusteredWriter.java:95)
at
org.apache.iceberg.io.ClusteredDataWriter.write(ClusteredDataWriter.java:34)
at
org.apache.iceberg.spark.source.SparkWrite$PartitionedDataWriter.write(SparkWrite.java:641)
at
org.apache.iceberg.spark.source.SparkWrite$PartitionedDataWriter.write(SparkWrite.java:616)
at
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$7(WriteToDataSourceV2Exec.scala:441)
at
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)
at
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:477)
at
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:385)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
`
The code looks something like this:
` spark.sql(
s"""
| INSERT OVERWRITE
| `$icebergCatalog`.`$icebergDatabase`.`$icebergTable`
| SELECT $selectSql
| FROM $hiveDatabase.$topic
| WHERE datekey>=$lowerDate AND datekey<=$upperDate AND event in
($eventStr)
|""".stripMargin
)`
The data source is a partition table of Hive.
How do I correctly write iceberg partition tables by `INSERT OVERWRITE` or
something else?
Iceberg version: 0.13.2
spark version: 3.0.1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]