sydneyhoran commented on issue #6316:
URL: https://github.com/apache/hudi/issues/6316#issuecomment-1460731185

   I added a workaround for this issue in my local fork of Hudi. Small tweaks 
to 
[HoodieAsyncService.java](https://github.com/sydneyhoran/hudi/blob/20f182d82e020ecd30fc1546ea0a4a6116276195/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/async/HoodieAsyncService.java#L128)
 and 
[HoodieMultiTableDeltaStreamer.java](https://github.com/sydneyhoran/hudi/blob/20f182d82e020ecd30fc1546ea0a4a6116276195/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java#L405)
 were required and now it's working as expected. The executor shutdown timeout 
being 24 hours was causing the job to hang, so I changed it to 10 seconds with 
no negative consequences. I also enabled the config 
`--post-write-termination-strategy-class 
org.apache.hudi.utilities.deltastreamer.NoNewDataTerminationStrategy` in 
MultiTableDeltaStreamer, so it can move on to the next table in 
MultiTableDeltaStreamer after N number of retries with no new data 
(max.rounds.without.new.d
 ata.to.shutdown) instead of being stuck on the first one until there is an 
error.
   
   It may not fully solve all use case, such as if you aren't expecting the No 
New Data condition in all the tables in the multi-job. Maybe a new 
PostWriteTerminationStrategy would be required for some use cases, or a 
refactor of how the loop functions.
   
   It does not go back to the beginning of the tables and continuously loop 
over the set of MultiTables, because after the last one has NoNewData the Spark 
job will end. So I am handling that within the job orchestration. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to