[GitHub] [hadoop] sodonnel commented on pull request #3928: HDFS-16438. Avoid holding read locks for a long time when scanDatanodeStorage

GitBox Wed, 26 Jan 2022 05:58:51 -0800


sodonnel commented on pull request #3928:
URL: https://github.com/apache/hadoop/pull/3928#issuecomment-1022221913



   I am not sure if it is safe to drop the read lock while iterating over the 
blocks in a StorageInfo as the blocks could change when you drop the lock.
   
   Looking at the code in DatanodeStorageInfo, we can add a block to the list, 
which always adds to the head of the list. The worst case here, is we don't 
consider some block, but it would be caught by the rescan at the end, so that 
is not a big problem.
   
   However we can remove a block. Lets say we stop iterating at blockX and drop 
the lock.
   
   Then a IBR comes in and gets processed, removing that block from the storage.
   
   The iterator will be holding a reference to that blockInfo object as 
`current`, and it is expected to call next on it to get the next block Info.
   
   The remove block code in BlockInfo.java looks like:
   
   ```  
   BlockInfo listRemove(BlockInfo head, DatanodeStorageInfo storage) {
       if (head == null) {
         return null;
       }
       int dnIndex = this.findStorageInfo(storage);
       if (dnIndex < 0) { // this block is not on the data-node list
         return head;
       }
   
       BlockInfo next = this.getNext(dnIndex);
       BlockInfo prev = this.getPrevious(dnIndex);
       this.setNext(dnIndex, null);
       this.setPrevious(dnIndex, null);
       if (prev != null) {
         prev.setNext(prev.findStorageInfo(storage), next);
       }
       if (next != null) {
         next.setPrevious(next.findStorageInfo(storage), prev);
       }
       if (this == head) { // removing the head
         head = next;
       }
       return head;
     }
   ```
   
   It is basically unlinking itself from the linked list, and then sets its own 
prev and next to null. Then the iterator calls `getNext` on it:
   
   ```
     BlockInfo getNext(int index) {
       assert this.triplets != null : "BlockInfo is not initialized";
       assert index >= 0 && index * 3 + 2 < triplets.length : "Index is out of 
bound";
       BlockInfo info = (BlockInfo)triplets[index * 3 + 2];
       assert info == null || info.getClass().getName().startsWith(
           BlockInfo.class.getName()) :
           "BlockInfo is expected at " + (index * 3 + 2);
       return info;
     }
   ```
   
   If the case of the current blockInfo having a null next, the iterator will 
get null, and think it has reached the end of the list and exit wrongly.
   
   This would be a rare bug to hit, as you would need to drop the lock and then 
the next block to consume from the list would need to be removed from the list, 
but it is one of those things that will happen from time to time and be very 
hard to explain.
   
   5M blocks on a single storage is a lot - dropping and re-locking at the 
storage level was supposed to handle DNs with millions of blocks, but each 
storage only having a small number.
   
   Does this pause occur once per storage at the start and end of decommission 
for each node?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hadoop] sodonnel commented on pull request #3928: HDFS-16438. Avoid holding read locks for a long time when scanDatanodeStorage

Reply via email to