[GitHub] [ozone] ayushtkn commented on a diff in pull request #3554: HDDS-6951. Replace bucket.listKeys() with bucket.listStatus() in OmBucketReadWriteKeyOps

GitBox Mon, 01 Aug 2022 18:08:57 -0700


ayushtkn commented on code in PR #3554:
URL: https://github.com/apache/ozone/pull/3554#discussion_r935027796



##########
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/OmBucketReadWriteKeyOps.java:
##########
@@ -128,12 +128,10 @@ protected String createPath(String path) {
   @Override
   protected int getReadCount(int readCount, String readPath)
       throws IOException {
-    Iterator<? extends OzoneKey> ozoneKeyIterator = bucket.listKeys(
-        OzoneConsts.OM_KEY_PREFIX + readPath + OzoneConsts.OM_KEY_PREFIX);
-    while (ozoneKeyIterator.hasNext()) {
-      ozoneKeyIterator.next();
-      ++readCount;
-    }
+    List<OzoneFileStatus> ozoneFileStatusList = bucket.listStatus(
+        OzoneConsts.OM_KEY_PREFIX + readPath + OzoneConsts.OM_KEY_PREFIX, true,
+        "/", keyCountForRead);

Review Comment:
   On a quick look I think here client is controlling the number of entries via 
``keyCountForRead``.
   
   In the FileSystem code as well, it seems to be controlled at the client side:
   
https://github.com/apache/ozone/blob/a32549cbc1227291d6ebd972ce32db043a5c51d0/hadoop-ozone/ozonefs-common/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneFileSystem.java#L608
   
   In HDFS, AFAIK the Namenode does impose such limits and the client can't 
explicitly demand the number of entries:
   
https://github.com/apache/hadoop/blob/123d1aa8846a2099b04fd8e8ed7f2bd6db4f36b5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java#L364-L366
   
   And the use case here seems to be just getting the ``readCount``, which 
actually doesn't require ``OzoneFileStatus``, it can easily be looped using the 
``LISTING_PAGE_SIZE`` or similar numbers, If you go at a scale of around a ~1 
Million entries or beyond you would be utilising a lot of memory storing the 
``FileStatus`` which you actually don't need, If I remember correct it started 
taking around ~1.5 Gigs/million, when I tried similar stuff with HDFSFileStatus 
in DistCp which sadly builds up the FileStatus tree in Breadth First Traversal 
approach. It would be less for Ozone for sure as compared to HDFS....
   
   I couldn't spare time to dig in much, Please feel free to proceed if you are 
convinced.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] ayushtkn commented on a diff in pull request #3554: HDDS-6951. Replace bucket.listKeys() with bucket.listStatus() in OmBucketReadWriteKeyOps

Reply via email to