nsivabalan removed a comment on pull request #3590: URL: https://github.com/apache/hudi/pull/3590#issuecomment-923333856
So, given this approach, we could also support async compaction and clustering in metadata table. Here is what we could do. all things stay same wrt data table. i.e. take locks and do conflict resolution for all regular writes, commit and release locks. take locks and do conflict resolution while scheduling compaction/clustering and release locks. take locks and commit compaction and clustering. when it comes to metadata table. We will enable multi-writer mode in metadata table. (As of this patch, we have only single writer mode for metadata table) all writes to metadata table happens within data table lock. And so wrt new delta commits to metadata table, it is always going to be a single writer implicitly. after committing to metadata table, we can just schedule compaction and cleaning if something is available. This internally will take the lock for metadata table and check for any conflicts, but since there are no other writers, we should be good. and once the commit and scheduling completes, we return to data table, make the commit and release the lockk. Later when async compaction for metadata table is about to get committed, we take metadata table lock and make the commit. this will ensure this may not collide with regular delta commits happening to metadata table. We may not invoke any conflict resolution here similar to how it is done in data table. But one major issue we need to fix here is the ConflictResolutionStrategy: as of now, there are any pending or complete compactions after current commit of interest, writes will fail. since all of them are going to operate on the same partition with one file group, there will definitely be conflict. So, just for metadata, we might want to consider if we can come up with a special conflict resolution strategy where we consider only new writes as conflicts and not any scheduled compaction). I need to understand the implications of this in more finer detail. But just putting it out here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
