GitHub user danny0405 edited a comment on the discussion: RocksDB as The Replica of MDT/RLI
> I don't like how we are coupling index choices and concurrency models. yeah, the simple bucket index is required to impl NBCC now and we may need more flexible and general design for concurrent modifications in streaming concurrent write scenarios. > can you please explain in detail,how the failover and OCC handling are > related. I think we can categorize the concurrent write cases into two: write with conflicts and write without conflicts. 1. If the write detects conflicts, the whole job/task will trigger failover and the RocksDB replica will rebootstrap from scrach, which can ensure the consistency of the index backend akka to MDT RLI index, but this needs to enable the early confclit detection: https://github.com/apache/hudi/pull/6133, the pre-commit conflict resolution does not work well for Flink streaming because it happens after a successful checkpoint, Hudi deems the write as failed if there is conflict while Flink deems the write as successful(from the latest successful checkpoint), to fix gap, the early confclit resolutuon is required here. 2. If the write does not detect confclits, there are still cases that another concurrent write modify the table with new record locations, the solution is we might need a early detection of the index backend freshness before each write: maintain a mappings between job-id to instant time so we can load the index changes maded from concurrent writers incrementally.(put the job-id in commit metadata or maintain it on the coordinator). This introduces a lot of complexities though, I'm expecting a more general solution for NBCC that is index type agnostic and not struggle in this index concurrent index trap. Here is the table for support of cuncurrent modifications with Flink RLI: | use case/concurrency mode | OCC | NBCC | |---|---|---| | write & write | Y(with early conflic detection and index refreshing) | N | | write & compaction | Y | N | | write & clustering | Y(with early index refreshing) | N | GitHub link: https://github.com/apache/hudi/discussions/18296#discussioncomment-16171800 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
