fuyou001 commented on PR #10519: URL: https://github.com/apache/rocketmq/pull/10519#issuecomment-4741428792
**[P1] Make cache ownership transfer atomic with concurrent cleanup** `PopConsumerCache.writeAndDeleteRecords` first classifies an old CK with `consumerRecords.contains(record)`, then performs the RocksDB write/delete batch, and only afterward calls `deleteRecords(bufferDeleteRecords)`. This check and removal are not atomic with `cleanupRecords`. A cleanup thread can move the old CK from `recordTreeMap` to `removeTreeMap`, flush it to RocksDB, and clear it from the cache after `contains` returned true but before the final cache deletion. The renewal path then writes the new CK without including the old CK in `storeDeleteRecords`; its final cache deletion fails, and that failure result is ignored. RocksDB can therefore retain both the old and new CK, potentially reviving the same message more than once. If the keys coincide, a late cleanup write can also overwrite the new value with the old one. Please coordinate cleanup and renewal through an atomic claim/removal or another explicit ownership state transition before deciding whether the old CK must be deleted from RocksDB. A concurrent regression test should pause between classification and deletion while cleanup stages and flushes the same record. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
