nsivabalan commented on PR #7469: URL: https://github.com/apache/hudi/pull/7469#issuecomment-1690616808
if I am not wrong, this is the core problem we are trying to solve. if there are failed commits and if two concurrent writers try to rollback concurrently we don't have a lock as such. These complications arise just bcoz, hudi tries to do automatic clean up of failed writes. In other similar systems, you may have to trigger explicit commands to clean up partially failed commits. or coordinate when multiple writers are involved. Wanted to call it out. Anyways, coming back to the original issue. Its recommended to disable table services (like cleaner, archival) in all writers except 1. So, we won't end up in such conflicts. These are anyways not latency sensitive. And w/ this approach all other writes will be even more faster since they don't trigger any of these table service and only take care of ingestion. We do have a table level config to disable all table services https://hudi.apache.org/docs/configurations/#hoodietableservicesenabled Having said all this, here is what I feel we could fix this issue. We can leverage the heartbeats, such that rollback commits also start to emit heartbeats. So, a concurrent writer know if some other writer is concurrently executing the rollback, or whether its in failed state. That way, only one writer will go ahead and execute the rollback while others will step away. I remember @suryaprasanna wanted to fix this if I am not wrong. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
