voonhous commented on issue #8071: URL: https://github.com/apache/hudi/issues/8071#issuecomment-1454525599
> Sorry for late reply, did you already use the append and it is still slow? Yeap, judging from the stack trace, he is running his job under append only mode. ```log org.apache.hudi.sink.append.AppendWriteFunction.initWriterHelper(AppendWriteFunction.java:110 ``` > Then we switched to the snappy format, and the writing performance did improve to a certain extent. However, due to the Tencent Cloud COS we used for storage, there was a list frequency control problem in cow writing, so the overall performance could not be greatly improved,and the exception is as follows: This feels like a COS issue. @DavidZ1 you mentioned `there was a list frequency control problem in cow writing`. So, it's spending too much time listing files? IIUC, your job might be running too many parquet files while flushing? I am not very familiar with COS, so I am taking a shot in the dark here, looking at your configurations, the default `write.parquet.max.file.size` is used, which is 120MB by default. Perhaps, you could try increasing this so that lesser parquet files are written? Do note that your parquet sizes will get larger. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
