nsivabalan removed a comment on pull request #3590:
URL: https://github.com/apache/hudi/pull/3590#issuecomment-923333856


   So, given this approach, we could also support async compaction and 
clustering in metadata table.
   
   Here is what we could do.
   all things stay same wrt data table. i.e. 
   take locks and do conflict resolution for all regular writes, commit and 
release locks. 
   take locks and do conflict resolution while scheduling compaction/clustering 
and release locks. 
   take locks and commit compaction and clustering. 
   
   when it comes to metadata table. We will enable multi-writer mode in 
metadata table. (As of this patch, we have only single writer mode for metadata 
table)
   all writes to metadata table happens within data table lock. And so wrt new 
delta commits to metadata table, it is always going to be a single writer 
implicitly. 
   after committing to metadata table, we can just schedule compaction and 
cleaning if something is available. This internally will take the lock for 
metadata table and check for any conflicts, but since there are no other 
writers, we should be good. 
   and once the commit and scheduling completes, we return to data table, make 
the commit and release the lockk.
   
   Later when async compaction for metadata table is about to get committed, we 
take metadata table lock and make the commit. this will ensure this may not 
collide with regular delta commits happening to metadata table. We may not 
invoke any conflict resolution here similar to how it is done in data table. 
   But one major issue we need to fix here is the ConflictResolutionStrategy: 
as of now, there are any pending or complete compactions after current commit 
of interest, writes will fail. since all of them are going to operate on the 
same partition with one file group, there will definitely be conflict. So, just 
for metadata, we might want to consider if we can come up with a special 
conflict resolution strategy where we consider only new writes as conflicts and 
not any scheduled compaction). I need to understand the implications of this in 
more finer detail. But just putting it out here. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to