koldic commented on issue #2620: URL: https://github.com/apache/hudi/issues/2620#issuecomment-1333488726
Hi, I have the same problem with slow stages. Firstly it runs well, however when more and more small files are inserted it slows, and the `Getting Small files` stage with the `Doing partition and writing data` stage takes even an hour to finish. I tried to change `hoodie.parquet.small.file.limit` to the smallest possible value (1MB) to limit the small files that it collects, but it won´t help. When I changed it to 0 it helped, since the stage and collecting small files is disabled with this value. Is there any way how to turn on this setting back without slowing down all jobs or just try to use offline compaction? I also use a simple Index, as the key is random. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
