ChenSammi commented on pull request #2538: URL: https://github.com/apache/ozone/pull/2538#issuecomment-904274915
> > > > > > The issue eventually turned out to be race among write chunk/readChunk threads all using the same file channel. With concurrent writers and readers on the same file, the behaviour seems unpredictable and leads to block file sparseness on disk. > > > > > > > > > @bshashikant , for ContainerStateMachine#read function,I'm not clear about one thing,when stateMachineDataCache.get(entry.getIndex()) return null, why we cannot think that the data is already written to the disk. > > > Thanks @ChenSammi . Usually, the entry will be in stateMachine cahe till the write chunk is applied which means, if a redaStateMachine comes in between till write chunk is not finished, it should be in the cache. But the cache has the entry added only for the leader. Now assume a case like this: 3 servers s1, s2 and s3. S1 being leader, s2 is fast follower and s3 being the slow follower. S1 sends entry 1 - 100 to S2 and S3 and then S1 dies, S2 has received all the log entries, (yet to complete the write to disk part ) being a follower doesn't update the data cache. Now S2 will be sending append Entries to S3,. S3 might ask for entries from s2 for which the write to disk is not completed neither these entries are in cache, as this node was follower when it received those entries. > > > BTW, if read chunk gets executed on a thread which is different than the thread which has done write chunk, it doesn't get the same view of the underlying file as the file handle is not closed on write. So, even if Read chunk gets executed post the write chunk completes, it may not necessarily see the same data unless its the same thread. That's why its important to remove the concurrency across multiple chunk writes/read for the same block. > > > > > > I didn't aware that stateMachineDataCache is only for leader. Now I got the point. I'm wondering if we can have such an cache for follower too. It seems the data is still in memory before the write finish. The cache just holds a reference to this data. What do you think? > > Besides, several changes in your previous commit are also good improvement to current code, such as the way of chunkExecutors allocation, and put raftFuture instead of writeChunkFuture in the writeChunkFutureMap. > > Well, from my point of view, followers maintaining cache as well to avoid few read calls in advent of failure case may not be the right thing to do. Performance improvement efforts can be done with focus on streaming rather. > > > Besides, several changes in your previous commit are also good improvement to current code, such as the way of chunkExecutors allocation, and put raftFuture instead of writeChunkFuture in the writeChunkFutureMap. > > The idea is to go back and study the perf characteristics with this change and also about file channel concurrency model as well. All other changes can be done as an afterthought of that. For now, i would like to stick to solve the data integrity issue. Agree, the improvement is good to have but not urgent, we can get them back later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
