[GitHub] [iceberg] jackye1995 commented on issue #2958: 使用bucket函数创建表后，向里面批量导入数据会报错

GitBox Thu, 12 Aug 2021 14:14:41 -0700


jackye1995 commented on issue #2958:
URL: https://github.com/apache/iceberg/issues/2958#issuecomment-897972554



   ```
   Caused by: java.lang.IllegalStateException: Already closed files for 
partition: id_bucket=3
        at 
org.apache.iceberg.io.PartitionedWriter.write(PartitionedWriter.java:69)
        at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$7(WriteToDataSourceV2Exec.scala:441)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)
        at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:477)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:385)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:127)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   ```
   
   看你上传的报错，@coolderli 的方法应该是对的，需要使用fanout 
writer。因为在使用bucket函数的情况下通常你无法对数据进行提前预判排序，所以不使用fanout 
writer必然会出现这个问题。理论上你也可以提前计算出每一行是哪个bucket然后以此进行排序再写入，这样就不需要fanout，但是就比较麻烦。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] jackye1995 commented on issue #2958: 使用bucket函数创建表后，向里面批量导入数据会报错

Reply via email to