RajasekarSribalan commented on issue #1823: URL: https://github.com/apache/hudi/issues/1823#issuecomment-660502873
Thanks @bvaradar @bhasudha and one more problem which I could see is, how the compaction and cleaner should be configured? Should both have same values? What If i configure clean commits as 3, so that I reclaim more space and compaction to happen after 24 commits.. Since I am doing cleaner frequently, will be delta commits will be cleaned/delete before compaction. Please shed some light on this matter because, i could see tons of files in hdfs for a single table. For example, in my case, when i ran a bulk insert for a table to store it in Hudi, there were 7000+ parquet files for created which was fine. After running streaming pipeline for doing upsert on the same table for 2 days, i could see there were 90,000+ files in HDFS. I havent changed the default cleaner configuration ,so i believe cleaning happends after 24 commits? so thats the reason i have these many files. Pls correct me if I am wrong. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
