sodonnel commented on pull request #2963:
URL: https://github.com/apache/ozone/pull/2963#issuecomment-1015548867
Prior to looking at this PR, I was not too sure how ICR and FCRs get
generated and sent. Let me summarise my understanding and please correct me if
I am wrong anywhere.
ICRs get generated "on demand" as containers are added, deleted and change
state on the datanode. They get added to the ICR queue on the datanode for
sending later.
FBRs get generated by the ReportPublisher based on a timer (60 minutes by
default). Each time the timer expires, a new FCR is generated and stored into
the "type2Reports" map inside StateContext.
There is another map inside StateContext that has an entry for each endpoint
receiving reports (multiple SCMs, Recon, etc) and effectively stores a boolean
for each report type, indicating if it is ready to send or not.
When the heartbeat is generated, every 30 seconds by default it will do the
following for each endpoint:
1. Add the full report if it is available to send, and then toggle the
ready-to-send boolean to false.
2. Add any ICRs queued and remove them from the queue.
The issue we are trying to solve here, is that a single heartbeat can have a
report which is slightly old, and contains the latest container id of 1000, but
there is an ICR with container id 1001. When the report hits SCM, there are two
threads processing. One handles ICRs and one FCRs, so the ICR goes first, adds
1001 to SCM, but then the FCR is processed and removes it again.
The solution in this PR, is to synchronize both ICR and FCR generation on
the same object (container set). Then when we go to send a report, we check to
see if the FCR is going to be sent too. If it is, we check for ICRs, remove
them and then "refresh" the FCR, which looks to me like we just generate it
again. While it does this, we block any new ICRs and hence new containers being
created. The blocking time should not be too long as the number of containers
is not too high generally.
I also don't think we care if an ICR contains a duplicate to the FCR,
provided it is always processed after the ICR. Eg we send a FCR with container
1000 and then also send a ICR with 1000 too.
My initial thought on the solution here, is that we are almost always going
to have to regenerate the FCR - its very likely there will be at least 1 ICR in
the heartbeat interval, and that means we are doubling the FCR generation.
On the SCM side, if we can guarantee that FCRs are processed before the ICRs
in the same heartbeat and also in order of heartbeats, we don't really care if
there are ICRs in the heartbeat, that were generated after the FCR.
Therefore, could we synchronized on generating the FCR (the ReportPublisher
task), and generate it and then remove the pending ICRs at the same time under
the same lock. Then when we come to send it, we know that the FCR has
everything that came before it, and any later ICRs are ok, provided we process
them after the FCR on SCM.
If we changed this to work as I suggested above, as things stand, the
synchronization is not tight enough to rule out duplicates as we mutate the
container object before we lock, and then synchronize on generating the ICR.
For example, in KeyValueContainer.java:
```
@Override
public void markContainerForClose(Container container)
throws IOException {
container.writeLock();
try {
// Move the container to CLOSING state only if it's OPEN
if (container.getContainerState() == State.OPEN) {
container.markContainerForClose(); // -> FCR gets lock here.
sendICR(container); // -> We only lock here, so FCR *might* see the
above change
}
} finally {
container.writeUnlock();
}
}
```
Its possible we change the container object, then the FCR gets the lock and
generates a FCR with the changed state. Then sendICR gets the lock and sends
the ICR. If we wanted to avoid duplicated, we would need synchronize before
modifying the container. However I am not sure we would need to do that - does
it matter if we send later duplicate ICRs?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]