guanziyue commented on PR #6384: URL: https://github.com/apache/hudi/pull/6384#issuecomment-1261243810
Not sure if author uses spark. I do understand this save a lot of time on huge table especially in spark streaming mode. In spark, all writing task cannot start until FileSystemView finish loading because Hudi on spark need FileSystemView info to determine small files before generating writing task. In my opinion, memory problem can be solved by other config. For example, using RocksDB Based FileSystemView which is nearly compulsory for large hudi table. But we have few to do for time consuming in this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
