nsivabalan commented on PR #7469:
URL: https://github.com/apache/hudi/pull/7469#issuecomment-1690616808

   if I am not wrong, this is the core problem we are trying to solve. 
   if there are failed commits and if two concurrent writers try to rollback 
concurrently we don't have a lock as such. 
   
   These complications arise just bcoz, hudi tries to do automatic clean up of 
failed writes. In other similar systems, you may have to trigger explicit 
commands to clean up partially failed commits. or coordinate when multiple 
writers are involved. 
   
   Wanted to call it out. 
   Anyways, coming back to the original issue. Its recommended to disable table 
services (like cleaner, archival) in all writers except 1. So, we won't end up 
in such conflicts. These are anyways not latency sensitive. And w/ this 
approach all other writes will be even more faster since they don't trigger any 
of these table service and only take care of ingestion. 
   
   We do have a table level config to disable all table services 
   https://hudi.apache.org/docs/configurations/#hoodietableservicesenabled 
   
   Having said all this, here is what I feel we could fix this issue. 
   
   We can leverage the heartbeats, such that rollback commits also start to 
emit heartbeats. So, a concurrent writer know if some other writer is 
concurrently executing the rollback, or whether its in failed state. That way, 
only one writer will go ahead and execute the rollback while others will step 
away. 
   
   I remember @suryaprasanna wanted to fix this if I am not wrong.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to