sodonnel commented on pull request #2963:
URL: https://github.com/apache/ozone/pull/2963#issuecomment-1015548867


   Prior to looking at this PR, I was not too sure how ICR and FCRs get 
generated and sent. Let me summarise my understanding and please correct me if 
I am wrong anywhere.
   
   ICRs get generated "on demand" as containers are added, deleted and change 
state on the datanode. They get added to the ICR queue on the datanode for 
sending later.
   
   FBRs get generated by the ReportPublisher based on a timer (60 minutes by 
default). Each time the timer expires, a new FCR is generated and stored into 
the "type2Reports" map inside StateContext.
   
   There is another map inside StateContext that has an entry for each endpoint 
receiving reports (multiple SCMs, Recon, etc) and effectively stores a boolean 
for each report type, indicating if it is ready to send or not.
   
   When the heartbeat is generated, every 30 seconds by default it will do the 
following for each endpoint:
   
   1. Add the full report if it is available to send, and then toggle the 
ready-to-send boolean to false.
   
   2. Add any ICRs queued and remove them from the queue.
   
   The issue we are trying to solve here, is that a single heartbeat can have a 
report which is slightly old, and contains the latest container id of 1000, but 
there is an ICR with container id 1001. When the report hits SCM, there are two 
threads processing. One handles ICRs and one FCRs, so the ICR goes first, adds 
1001 to SCM, but then the FCR is processed and removes it again.
   
   The solution in this PR, is to synchronize both ICR and FCR generation on 
the same object (container set). Then when we go to send a report, we check to 
see if the FCR is going to be sent too. If it is, we check for ICRs, remove 
them and then "refresh" the FCR, which looks to me like we just generate it 
again. While it does this, we block any new ICRs and hence new containers being 
created. The blocking time should not be too long as the number of containers 
is not too high generally.
   
   I also don't think we care if an ICR contains a duplicate to the FCR, 
provided it is always processed after the ICR. Eg we send a FCR with container 
1000 and then also send a ICR with 1000 too.
   
   My initial thought on the solution here, is that we are almost always going 
to have to regenerate the FCR - its very likely there will be at least 1 ICR in 
the heartbeat interval, and that means we are doubling the FCR generation.
   
   On the SCM side, if we can guarantee that FCRs are processed before the ICRs 
in the same heartbeat and also in order of heartbeats, we don't really care if 
there are ICRs in the heartbeat, that were generated after the FCR.
   
   Therefore, could we synchronized on generating the FCR (the ReportPublisher 
task), and generate it and then remove the pending ICRs at the same time under 
the same lock. Then when we come to send it, we know that the FCR has 
everything that came before it, and any later ICRs are ok, provided we process 
them after the FCR on SCM.
   
   If we changed this to work as I suggested above, as things stand, the 
synchronization is not tight enough to rule out duplicates as we mutate the 
container object before we lock, and then synchronize on generating the ICR. 
For example, in KeyValueContainer.java:
   
   ```
     @Override
     public void markContainerForClose(Container container)
         throws IOException {
       container.writeLock();
       try {
         // Move the container to CLOSING state only if it's OPEN
         if (container.getContainerState() == State.OPEN) {
           container.markContainerForClose();  // -> FCR gets lock here.
           sendICR(container);  // -> We only lock here, so FCR *might* see the 
above change
         }
       } finally {
         container.writeUnlock();
       }
     }
   ```
   
   Its possible we change the container object, then the FCR gets the lock and 
generates a FCR with the changed state. Then sendICR gets the lock and sends 
the ICR. If we wanted to avoid duplicated, we would need synchronize before 
modifying the container. However I am not sure we would need to do that - does 
it matter if we send later duplicate ICRs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to