[GitHub] [iceberg] yanqinghe commented on issue #2958: 使用bucket函数创建表后，向里面批量导入数据会报错

GitBox Sat, 14 Aug 2021 02:00:46 -0700


yanqinghe commented on issue #2958:
URL: https://github.com/apache/iceberg/issues/2958#issuecomment-898868062



   > > 如果你使用的是Spark Structured Streaming的话，可以在写入的时候指定fanout-enabled配置：
   > > ```scala
   > >   .writeStream
   > >   .format("iceberg")
   > >   .options(Map(
   > >      "fanout-enabled" -> "true"
   > >   ))
   > >   .outputMode(OutputMode.Append())
   > >   .trigger(Trigger.ProcessingTime(5,  TimeUnit.MINUTES))
   > >   .start()
   > >   .awaitTermination()
   > > ```
   > > 
   > > 
   > >     
   > >       
   > >     
   > > 
   > >       
   > >     
   > > 
   > >     
   > >   
   > > > > @zephaniah-wzf 可以发一下具体的堆栈吗
   > > > 
   > > > 
   > > > [报错信息.txt](https://github.com/apache/iceberg/files/6980457/default.txt)
   > > > 还是会报那样的错误，有点不合理啊
   > 
   > 我使用的是spark-sql客户端写的
   
   如果使用Spark-sql客户端的话应该是批量任务，你可以尝试对你的分区列进行order 
by排序，这样能够避免在写入的过程中在同一个分区内出现多个文件的并发操作。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] yanqinghe commented on issue #2958: 使用bucket函数创建表后，向里面批量导入数据会报错

Reply via email to