bvaradar commented on issue #1902: URL: https://github.com/apache/hudi/issues/1902#issuecomment-672406725
With bulk insert, the parallelism configuration determines the lower bound on the number of files. Since, you started with bulk insert, you are seeing that many number of files. Hudi upsert/insert will route "new records" (with new record keys) to these small files. So, If there are new records on the same partition, you will see those smalll files growing. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
