rishabhreply opened a new issue, #10559: URL: https://github.com/apache/hudi/issues/10559
**Describe the problem you faced** It is not a problem but rather a question that I could not find in FAQs. Please let me know if it is unacceptable to ask here. I have data coming in multiple files (let's say 10 files) for one table and all will have same value in partition_column. My setup is state machine with Glue parallelization enabled. Lets say I have set a batch size=2 and concurrency=5 in state machine, this will mean the state machine will trigger 5 parallel glue job instances and give each instance 2 files to process. I am using **insert_overwrite** hudi method. Q1. In this setting how will Hudi work as not all glue job instances might finish at the same time? Will I see any Hudi errors? Or will it "overwrite" the data written by the glue job instances that finished earlier? **Environment Description** * Hudi version : * Spark version : * Hive version : * Hadoop version : * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
