sodonnel commented on code in PR #4655:
URL: https://github.com/apache/ozone/pull/4655#discussion_r1228397909
##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/KeyValueContainerUtil.java:
##########
@@ -181,21 +182,37 @@ public static void removeContainer(KeyValueContainerData
containerData,
/**
* Returns if there are no blocks in the container.
+ * @param store DBStore
* @param containerData Container to check
+ * @param bCheckChunksFilePath Whether to check chunksfilepath has any blocks
* @return true if the directory containing blocks is empty
* @throws IOException
*/
- public static boolean noBlocksInContainer(KeyValueContainerData
- containerData)
+ public static boolean noBlocksInContainer(DatanodeStore store,
+ KeyValueContainerData
+ containerData,
+ boolean bCheckChunksFilePath)
throws IOException {
+ Preconditions.checkNotNull(store);
Preconditions.checkNotNull(containerData);
- File chunksPath = new File(containerData.getChunksPath());
- Preconditions.checkArgument(chunksPath.isDirectory());
-
- try (DirectoryStream<Path> dir
- = Files.newDirectoryStream(chunksPath.toPath())) {
- return !dir.iterator().hasNext();
+ if (containerData.isOpen()) {
+ return false;
+ }
+ try (BlockIterator<BlockData> blockIterator =
+ store.getBlockIterator(containerData.getContainerID())) {
+ if (blockIterator.hasNext()) {
+ return false;
+ }
}
+ if (bCheckChunksFilePath) {
Review Comment:
Today I came across a scenario:
1. Client tries to write an EC block, but at roughly the same time a node is
stopped it was about to write to.
2. The client tries to write out the stripe to the data and parity nodes,
but if fails to goto all the nodes as one node is down. The client notices this
and abandons the stripe and writes the data again to a new container.
3. Now we have a case were some containers have a chunk written, but no "put
block" got executed so RocksDB has no entry to the chunk / blocks.
4. The container is reported to SCM and closed (trigged by the stale node
handler closing the pipeline) with zero keys and zero bytes (due to the missing
put block).
5. Goes into delete handing in SCM, which sends a delete command over and
over, as it is skipped by the DN due to the container not being empty.
Feels like the existing check is too conservative, as this is a scenario
that is valid to happen, and we need to be able to remove these replicas.
I am wondering if the changes here would help or would the behavior be the
same, leaving the container stuck in the DELETING state forever?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]