sodonnel commented on pull request #3928:
URL: https://github.com/apache/hadoop/pull/3928#issuecomment-1022221913
I am not sure if it is safe to drop the read lock while iterating over the
blocks in a StorageInfo as the blocks could change when you drop the lock.
Looking at the code in DatanodeStorageInfo, we can add a block to the list,
which always adds to the head of the list. The worst case here, is we don't
consider some block, but it would be caught by the rescan at the end, so that
is not a big problem.
However we can remove a block. Lets say we stop iterating at blockX and drop
the lock.
Then a IBR comes in and gets processed, removing that block from the storage.
The iterator will be holding a reference to that blockInfo object as
`current`, and it is expected to call next on it to get the next block Info.
The remove block code in BlockInfo.java looks like:
```
BlockInfo listRemove(BlockInfo head, DatanodeStorageInfo storage) {
if (head == null) {
return null;
}
int dnIndex = this.findStorageInfo(storage);
if (dnIndex < 0) { // this block is not on the data-node list
return head;
}
BlockInfo next = this.getNext(dnIndex);
BlockInfo prev = this.getPrevious(dnIndex);
this.setNext(dnIndex, null);
this.setPrevious(dnIndex, null);
if (prev != null) {
prev.setNext(prev.findStorageInfo(storage), next);
}
if (next != null) {
next.setPrevious(next.findStorageInfo(storage), prev);
}
if (this == head) { // removing the head
head = next;
}
return head;
}
```
It is basically unlinking itself from the linked list, and then sets its own
prev and next to null. Then the iterator calls `getNext` on it:
```
BlockInfo getNext(int index) {
assert this.triplets != null : "BlockInfo is not initialized";
assert index >= 0 && index * 3 + 2 < triplets.length : "Index is out of
bound";
BlockInfo info = (BlockInfo)triplets[index * 3 + 2];
assert info == null || info.getClass().getName().startsWith(
BlockInfo.class.getName()) :
"BlockInfo is expected at " + (index * 3 + 2);
return info;
}
```
If the case of the current blockInfo having a null next, the iterator will
get null, and think it has reached the end of the list and exit wrongly.
This would be a rare bug to hit, as you would need to drop the lock and then
the next block to consume from the list would need to be removed from the list,
but it is one of those things that will happen from time to time and be very
hard to explain.
5M blocks on a single storage is a lot - dropping and re-locking at the
storage level was supposed to handle DNs with millions of blocks, but each
storage only having a small number.
Does this pause occur once per storage at the start and end of decommission
for each node?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]