ChenSammi commented on pull request #2538:
URL: https://github.com/apache/ozone/pull/2538#issuecomment-904274915


   > 
   > 
   > > > > The issue eventually turned out to be race among write 
chunk/readChunk threads all using the same file channel. With concurrent 
writers and readers on the same file, the behaviour seems unpredictable and 
leads to block file sparseness on disk.
   > > > 
   > > > 
   > > > @bshashikant , for ContainerStateMachine#read function,I'm not clear 
about one thing,when stateMachineDataCache.get(entry.getIndex()) return null, 
why we cannot think that the data is already written to the disk.
   > > > Thanks @ChenSammi . Usually, the entry will be in stateMachine cahe 
till the write chunk is applied which means, if a redaStateMachine comes in 
between till write chunk is not finished, it should be in the cache. But the 
cache has the entry added only for the leader. Now assume a case like this: 3 
servers s1, s2 and s3. S1 being leader, s2 is fast follower and s3 being the 
slow follower. S1 sends entry 1 - 100 to S2 and S3 and then S1 dies, S2 has 
received all the log entries, (yet to complete the write to disk part ) being a 
follower doesn't update the data cache. Now S2 will be sending append Entries 
to S3,. S3 might ask for entries from s2 for which the write to disk is not 
completed neither these entries are in cache, as this node was follower when it 
received those entries.
   > > > BTW, if read chunk gets executed on a thread which is different than 
the thread which has done write chunk, it doesn't get the same view of the 
underlying file as the file handle is not closed on write. So, even if Read 
chunk gets executed post the write chunk completes, it may not necessarily see 
the same data unless its the same thread. That's why its important to remove 
the concurrency across multiple chunk writes/read for the same block.
   > > 
   > > 
   > > I didn't aware that stateMachineDataCache is only for leader. Now I got 
the point. I'm wondering if we can have such an cache for follower too. It 
seems the data is still in memory before the write finish. The cache just holds 
a reference to this data. What do you think?
   > > Besides, several changes in your previous commit are also good 
improvement to current code, such as the way of chunkExecutors allocation, and 
put raftFuture instead of writeChunkFuture in the writeChunkFutureMap.
   > 
   > Well, from my point of view, followers maintaining cache as well to avoid 
few read calls in advent of failure case may not be the right thing to do. 
Performance improvement efforts can be done with focus on streaming rather.
   > 
   > > Besides, several changes in your previous commit are also good 
improvement to current code, such as the way of chunkExecutors allocation, and 
put raftFuture instead of writeChunkFuture in the writeChunkFutureMap.
   > 
   > The idea is to go back and study the perf characteristics with this change 
and also about file channel concurrency model as well. All other changes can be 
done as an afterthought of that. For now, i would like to stick to solve the 
data integrity issue.
   
   Agree, the improvement is good to have but not urgent, we can get them back 
later. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to