voonhous commented on issue #8071:
URL: https://github.com/apache/hudi/issues/8071#issuecomment-1454525599

   > Sorry for late reply, did you already use the append and it is still slow?
   
   Yeap, judging from the stack trace, he is running his job under append only 
mode.
   
   ```log
   
org.apache.hudi.sink.append.AppendWriteFunction.initWriterHelper(AppendWriteFunction.java:110
   ```
   
   > Then we switched to the snappy format, and the writing performance did 
improve to a certain extent. However, due to the Tencent Cloud COS we used for 
storage, there was a list frequency control problem in cow writing, so the 
overall performance could not be greatly improved,and the exception is as 
follows:
   
   This  feels like a COS issue. @DavidZ1 you mentioned `there was a list 
frequency control problem in cow writing`. So, it's spending too much time 
listing files? IIUC, your job might be running too many parquet files while 
flushing? I am not very familiar with COS, so I am taking a shot in the dark 
here, looking at your configurations, the default `write.parquet.max.file.size` 
is used, which is 120MB by default.
   
   Perhaps, you could try increasing this so that lesser parquet files are 
written? Do note that your parquet sizes will get larger.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to