guihecheng commented on PR #3514:
URL: https://github.com/apache/ozone/pull/3514#issuecomment-1168143820

   > @guihecheng, I have discovered several issues with this. So, I have just 
kept this issue a side currently as I am focussing on other prioritized tasks.
   > 
   > Here just empty put block will not work:
   > 
   > 1. Container may not be available. PutBlock will not create container on 
absence. Container's usually be created on first write chunk. Since we don't 
have write chunk, empty putBlocks will just fail.
   > 2. Last putBlock with close flag true will fail even though container is 
available as chunkFile may not exist. close flag will enforce to flush the 
content to os file system at the DN.
   > 
   > So, the best way could be to create the container's from client on 
initializing the streams first. Here I found a problem that, container's may 
exist already and exist createContainer API will just fail when container 
already exist. SO, we may need a createContainerAPI which should not fail when 
container exist. May be we need additional flag in createContainer API?
   > 
   > With offline recovery, we are creating the container all the time to 
target nodes first. So, even though no blocks needs to be recovered, we may 
create the container. So, first recovery task will solve the problem and from 
then onward we will have empty containers I guess. Only thing we need to make 
sure is, empty containers should not get delete in EC case from RM. With this, 
I felt this can be low prioritized, until RM work is done.
   > 
   > Doing it as part of close or start of input stream should not have any 
difference. In both cases, createContainer will fail if already exist.
   > 
   > Please not current code is just experimental changes. Not for commit or 
review. Feel free to suggest or take up if you have any other thoughts which 
may be simpler and low risk.
   
   Thanks for the info, it is really a problem that needs to be discuss more 
about, since the EC on-disk container replica is different from the 
Ratis-Replicated container replica, then some old assumptions(e.g. create 
container replica on the first WriteChunk) may not apply.
   
   I should say that RM won't be able to trigger Offline Recovery facing a 
CLOSED tiny container with possibly only one partial stripe since it doesn't 
have enough container replica reports gathered. So we need to handle this 
problem specially on the write path or somewhere else. The problem is likely to 
hit when doing Offline Recovery test with small files(e.g. 1MB) only, it may 
not be very easy to reveal on real mixed workloads, but still possible.
   
   Create container replicas on init of BlockStreams is a workable idea as I 
think, but it may possibly bring a performance impact. I'm thinking that maybe 
we could handle this during the OPEN/CLOSE of the container/pipeline on SCM 
side, but it also has problems that we don't have synchronous view of container 
replica creation, and we can't wait for the creations.
   
   Let's dig more about this later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to