linyiqun commented on a change in pull request #1815:
URL: https://github.com/apache/ozone/pull/1815#discussion_r561927986
##########
File path:
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
##########
@@ -2312,17 +2311,40 @@ private void listStatusFindKeyInTableCache(
return fileStatusList;
}
+ @SuppressWarnings("methodlength")
public List<OzoneFileStatus> listStatusV1(OmKeyArgs args, boolean recursive,
String startKey, long numEntries, String clientAddress)
throws IOException {
Preconditions.checkNotNull(args, "Key args can not be null");
// unsorted OMKeyInfo list contains combine results from TableCache and DB.
List<OzoneFileStatus> fileStatusFinalList = new ArrayList<>();
- LinkedHashSet<OzoneFileStatus> fileStatusList = new LinkedHashSet<>();
+
if (numEntries <= 0) {
return fileStatusFinalList;
}
+
+ /**
+ * A map sorted by OmKey to combine results from TableCache and DB for
+ * each entity - Dir & File.
+ *
+ * Two separate maps are required because the order of seek -> (1)Seek
+ * files in fileTable (2)Seek dirs in dirTable.
+ *
+ * StartKey should be added to the final listStatuses, so if we combine
+ * files and dirs into a single map then directory with lower precedence
+ * will appear at the top of the list even if the startKey is given as
+ * fileName.
+ *
+ * For example, startKey="a/file1". As per the seek order, first fetches
+ * all the files and then it will start seeking all the directories.
+ * Assume a directory name exists "a/b". With one map, the sorted list will
+ * be ["a/b", "a/file1"]. But the expected list is: ["a/file1", "a/b"],
+ * startKey element should always be at the top of the listStatuses.
Review comment:
A little confused with this example. startKey is used to list keys which
are behinds this key. "a/b" should not be expected in the return list, right? I
also checked the logic in getDirectories method. It also doesn't use startKey
to do the key comparison,
[KeyManagerImpl.java#L2633](https://github.com/apache/ozone/blob/HDDS-2939/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java#L2633)
##########
File path:
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestObjectStoreV1.java
##########
@@ -78,107 +90,143 @@ public static void init() throws Exception {
.setOmId(omId)
.build();
cluster.waitForClusterToBeReady();
+ // create a volume and a bucket to be used by OzoneFileSystem
+ OzoneBucket bucket = TestDataUtil.createVolumeAndBucket(cluster);
+ volumeName = bucket.getVolumeName();
+ bucketName = bucket.getName();
+
+ String rootPath = String.format("%s://%s.%s/",
+ OzoneConsts.OZONE_URI_SCHEME, bucket.getName(),
+ bucket.getVolumeName());
+ // Set the fs.defaultFS and start the filesystem
+ conf.set(CommonConfigurationKeysPublic.FS_DEFAULT_NAME_KEY, rootPath);
+ // Set the number of keys to be processed during batch operate.
+ conf.setInt(OZONE_FS_ITERATE_BATCH_SIZE, 5);
+ fs = FileSystem.get(conf);
+ }
+
+ @After
+ public void tearDown() throws Exception {
+ deleteRootDir();
+ }
+
+ /**
+ * Cleanup files and directories.
+ *
+ * @throws IOException DB failure
+ */
+ private void deleteRootDir() throws IOException {
+ Path root = new Path("/");
+ FileStatus[] fileStatuses = fs.listStatus(root);
+
+ if (fileStatuses == null) {
+ return;
+ }
+
+ for (FileStatus fStatus : fileStatuses) {
+ fs.delete(fStatus.getPath(), true);
+ }
+
+ fileStatuses = fs.listStatus(root);
+ if (fileStatuses != null) {
+ Assert.assertEquals("Delete root failed!", 0, fileStatuses.length);
+ }
}
@Test
public void testCreateKey() throws Exception {
- String volumeName = "volume" + RandomStringUtils.randomNumeric(5);
- String bucketName = "bucket" + RandomStringUtils.randomNumeric(5);
String parent = "a/b/c/";
Review comment:
Make the sense to me.
##########
File path:
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
##########
@@ -2312,17 +2311,40 @@ private void listStatusFindKeyInTableCache(
return fileStatusList;
}
+ @SuppressWarnings("methodlength")
public List<OzoneFileStatus> listStatusV1(OmKeyArgs args, boolean recursive,
String startKey, long numEntries, String clientAddress)
throws IOException {
Preconditions.checkNotNull(args, "Key args can not be null");
// unsorted OMKeyInfo list contains combine results from TableCache and DB.
List<OzoneFileStatus> fileStatusFinalList = new ArrayList<>();
- LinkedHashSet<OzoneFileStatus> fileStatusList = new LinkedHashSet<>();
+
if (numEntries <= 0) {
return fileStatusFinalList;
}
+
+ /**
+ * A map sorted by OmKey to combine results from TableCache and DB for
+ * each entity - Dir & File.
+ *
+ * Two separate maps are required because the order of seek -> (1)Seek
+ * files in fileTable (2)Seek dirs in dirTable.
+ *
+ * StartKey should be added to the final listStatuses, so if we combine
+ * files and dirs into a single map then directory with lower precedence
+ * will appear at the top of the list even if the startKey is given as
+ * fileName.
+ *
+ * For example, startKey="a/file1". As per the seek order, first fetches
+ * all the files and then it will start seeking all the directories.
+ * Assume a directory name exists "a/b". With one map, the sorted list will
+ * be ["a/b", "a/file1"]. But the expected list is: ["a/file1", "a/b"],
+ * startKey element should always be at the top of the listStatuses.
Review comment:
A little confused with this example. startKey is used to list keys which
are behinds this key. "a/b" should not be expected in the return list since a/b
is not behind the key a/file1, right? I also checked the logic in
getDirectories method. It also doesn't use startKey to do the key comparison,
[KeyManagerImpl.java#L2633](https://github.com/apache/ozone/blob/HDDS-2939/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java#L2633)
##########
File path:
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
##########
@@ -2312,17 +2311,40 @@ private void listStatusFindKeyInTableCache(
return fileStatusList;
}
+ @SuppressWarnings("methodlength")
public List<OzoneFileStatus> listStatusV1(OmKeyArgs args, boolean recursive,
String startKey, long numEntries, String clientAddress)
throws IOException {
Preconditions.checkNotNull(args, "Key args can not be null");
// unsorted OMKeyInfo list contains combine results from TableCache and DB.
List<OzoneFileStatus> fileStatusFinalList = new ArrayList<>();
- LinkedHashSet<OzoneFileStatus> fileStatusList = new LinkedHashSet<>();
+
if (numEntries <= 0) {
return fileStatusFinalList;
}
+
+ /**
+ * A map sorted by OmKey to combine results from TableCache and DB for
+ * each entity - Dir & File.
+ *
+ * Two separate maps are required because the order of seek -> (1)Seek
+ * files in fileTable (2)Seek dirs in dirTable.
+ *
+ * StartKey should be added to the final listStatuses, so if we combine
+ * files and dirs into a single map then directory with lower precedence
+ * will appear at the top of the list even if the startKey is given as
+ * fileName.
+ *
+ * For example, startKey="a/file1". As per the seek order, first fetches
+ * all the files and then it will start seeking all the directories.
+ * Assume a directory name exists "a/b". With one map, the sorted list will
+ * be ["a/b", "a/file1"]. But the expected list is: ["a/file1", "a/b"],
+ * startKey element should always be at the top of the listStatuses.
Review comment:
A little confused with this example. startKey is used to list keys which
are behinds this key. "a/b" should not be expected in the return list since a/b
is not behind the key a/file1, right? I also checked the logic in
getDirectories method. It also doesn't use startKey to do the key comparison,
[KeyManagerImpl.java#L2633](https://github.com/apache/ozone/blob/HDDS-2939/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java#L2633)
Current listKey logic in master branch, it use startKey as the seekKey when
startKey is not blank.
[OmMetadataManagerImpl.java#L812](https://github.com/apache/ozone/blob/3d70fab346663b8b3fed5c49e43cf684a33c3ade/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java#L812)
```java
boolean skipStartKey = false;
if (StringUtil.isNotBlank(startKey)) {
// Seek to the specified key.
seekKey = getOzoneKey(volumeName, bucketName, startKey);
skipStartKey = true;
} else {
// This allows us to seek directly to the first key with the right
prefix.
seekKey = getOzoneKey(volumeName, bucketName,
StringUtil.isNotBlank(keyPrefix) ? keyPrefix : OM_KEY_PREFIX);
}
...
while (iterator.hasNext()) {
Map.Entry< CacheKey<String>, CacheValue<OmKeyInfo>> entry =
iterator.next();
String key = entry.getKey().getCacheKey();
OmKeyInfo omKeyInfo = entry.getValue().getCacheValue();
// Making sure that entry in cache is not for delete key request.
if (omKeyInfo != null) {
if (key.startsWith(seekPrefix) && key.compareTo(seekKey) >= 0) {
cacheKeyMap.put(key, omKeyInfo);
}
} else {
deletedKeySet.add(key);
}
}
```
##########
File path:
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
##########
@@ -2312,17 +2311,40 @@ private void listStatusFindKeyInTableCache(
return fileStatusList;
}
+ @SuppressWarnings("methodlength")
public List<OzoneFileStatus> listStatusV1(OmKeyArgs args, boolean recursive,
String startKey, long numEntries, String clientAddress)
throws IOException {
Preconditions.checkNotNull(args, "Key args can not be null");
// unsorted OMKeyInfo list contains combine results from TableCache and DB.
List<OzoneFileStatus> fileStatusFinalList = new ArrayList<>();
- LinkedHashSet<OzoneFileStatus> fileStatusList = new LinkedHashSet<>();
+
if (numEntries <= 0) {
return fileStatusFinalList;
}
+
+ /**
+ * A map sorted by OmKey to combine results from TableCache and DB for
+ * each entity - Dir & File.
+ *
+ * Two separate maps are required because the order of seek -> (1)Seek
+ * files in fileTable (2)Seek dirs in dirTable.
+ *
+ * StartKey should be added to the final listStatuses, so if we combine
+ * files and dirs into a single map then directory with lower precedence
+ * will appear at the top of the list even if the startKey is given as
+ * fileName.
+ *
+ * For example, startKey="a/file1". As per the seek order, first fetches
+ * all the files and then it will start seeking all the directories.
+ * Assume a directory name exists "a/b". With one map, the sorted list will
+ * be ["a/b", "a/file1"]. But the expected list is: ["a/file1", "a/b"],
+ * startKey element should always be at the top of the listStatuses.
Review comment:
A little confused with this example. startKey is used to list keys which
are behinds this key. "a/b" should not be expected in the return list since a/b
is not behind the key a/file1, right? I also checked the logic in
getDirectories method. It also doesn't use startKey to do the key comparison,
[KeyManagerImpl.java#L2633](https://github.com/apache/ozone/blob/HDDS-2939/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java#L2633)
seekKeyInDB is passed from prefix key not startKey.
Current listKey logic in master branch, it use startKey as the seekKey when
startKey is not blank.
[OmMetadataManagerImpl.java#L812](https://github.com/apache/ozone/blob/3d70fab346663b8b3fed5c49e43cf684a33c3ade/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java#L812)
```java
boolean skipStartKey = false;
if (StringUtil.isNotBlank(startKey)) {
// Seek to the specified key.
seekKey = getOzoneKey(volumeName, bucketName, startKey);
skipStartKey = true;
} else {
// This allows us to seek directly to the first key with the right
prefix.
seekKey = getOzoneKey(volumeName, bucketName,
StringUtil.isNotBlank(keyPrefix) ? keyPrefix : OM_KEY_PREFIX);
}
...
while (iterator.hasNext()) {
Map.Entry< CacheKey<String>, CacheValue<OmKeyInfo>> entry =
iterator.next();
String key = entry.getKey().getCacheKey();
OmKeyInfo omKeyInfo = entry.getValue().getCacheValue();
// Making sure that entry in cache is not for delete key request.
if (omKeyInfo != null) {
if (key.startsWith(seekPrefix) && key.compareTo(seekKey) >= 0) {
cacheKeyMap.put(key, omKeyInfo);
}
} else {
deletedKeySet.add(key);
}
}
```
##########
File path:
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
##########
@@ -2312,17 +2311,40 @@ private void listStatusFindKeyInTableCache(
return fileStatusList;
}
+ @SuppressWarnings("methodlength")
public List<OzoneFileStatus> listStatusV1(OmKeyArgs args, boolean recursive,
String startKey, long numEntries, String clientAddress)
throws IOException {
Preconditions.checkNotNull(args, "Key args can not be null");
// unsorted OMKeyInfo list contains combine results from TableCache and DB.
List<OzoneFileStatus> fileStatusFinalList = new ArrayList<>();
- LinkedHashSet<OzoneFileStatus> fileStatusList = new LinkedHashSet<>();
+
if (numEntries <= 0) {
return fileStatusFinalList;
}
+
+ /**
+ * A map sorted by OmKey to combine results from TableCache and DB for
+ * each entity - Dir & File.
+ *
+ * Two separate maps are required because the order of seek -> (1)Seek
+ * files in fileTable (2)Seek dirs in dirTable.
+ *
+ * StartKey should be added to the final listStatuses, so if we combine
+ * files and dirs into a single map then directory with lower precedence
+ * will appear at the top of the list even if the startKey is given as
+ * fileName.
+ *
+ * For example, startKey="a/file1". As per the seek order, first fetches
+ * all the files and then it will start seeking all the directories.
+ * Assume a directory name exists "a/b". With one map, the sorted list will
+ * be ["a/b", "a/file1"]. But the expected list is: ["a/file1", "a/b"],
+ * startKey element should always be at the top of the listStatuses.
Review comment:
Thanks for detailed explanation, @rakeshadr . Suppose there is a
different meaning of the startKey under separate file/dir table here. I will
take a deep look for this.
In order not to unblock subsequent PRs, I will give my approval here.
##########
File path:
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
##########
@@ -2312,17 +2311,40 @@ private void listStatusFindKeyInTableCache(
return fileStatusList;
}
+ @SuppressWarnings("methodlength")
public List<OzoneFileStatus> listStatusV1(OmKeyArgs args, boolean recursive,
String startKey, long numEntries, String clientAddress)
throws IOException {
Preconditions.checkNotNull(args, "Key args can not be null");
// unsorted OMKeyInfo list contains combine results from TableCache and DB.
List<OzoneFileStatus> fileStatusFinalList = new ArrayList<>();
- LinkedHashSet<OzoneFileStatus> fileStatusList = new LinkedHashSet<>();
+
if (numEntries <= 0) {
return fileStatusFinalList;
}
+
+ /**
+ * A map sorted by OmKey to combine results from TableCache and DB for
+ * each entity - Dir & File.
+ *
+ * Two separate maps are required because the order of seek -> (1)Seek
+ * files in fileTable (2)Seek dirs in dirTable.
+ *
+ * StartKey should be added to the final listStatuses, so if we combine
+ * files and dirs into a single map then directory with lower precedence
+ * will appear at the top of the list even if the startKey is given as
+ * fileName.
+ *
+ * For example, startKey="a/file1". As per the seek order, first fetches
+ * all the files and then it will start seeking all the directories.
+ * Assume a directory name exists "a/b". With one map, the sorted list will
+ * be ["a/b", "a/file1"]. But the expected list is: ["a/file1", "a/b"],
+ * startKey element should always be at the top of the listStatuses.
Review comment:
Thanks for detailed explanation, @rakeshadr . Suppose there is a
different meaning of the startKey under separate file/dir table here. I will
take a deep look for this.
In order not to unblock subsequent PRs, I will give my approval here.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]