ayushtkn commented on code in PR #3554:
URL: https://github.com/apache/ozone/pull/3554#discussion_r935027796
##########
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/OmBucketReadWriteKeyOps.java:
##########
@@ -128,12 +128,10 @@ protected String createPath(String path) {
@Override
protected int getReadCount(int readCount, String readPath)
throws IOException {
- Iterator<? extends OzoneKey> ozoneKeyIterator = bucket.listKeys(
- OzoneConsts.OM_KEY_PREFIX + readPath + OzoneConsts.OM_KEY_PREFIX);
- while (ozoneKeyIterator.hasNext()) {
- ozoneKeyIterator.next();
- ++readCount;
- }
+ List<OzoneFileStatus> ozoneFileStatusList = bucket.listStatus(
+ OzoneConsts.OM_KEY_PREFIX + readPath + OzoneConsts.OM_KEY_PREFIX, true,
+ "/", keyCountForRead);
Review Comment:
On a quick look I think here client is controlling the number of entries via
``keyCountForRead``.
In the FileSystem code as well, it seems to be controlled at the client side:
https://github.com/apache/ozone/blob/a32549cbc1227291d6ebd972ce32db043a5c51d0/hadoop-ozone/ozonefs-common/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneFileSystem.java#L608
In HDFS, AFAIK the Namenode does impose such limits and the client can't
explicitly demand the number of entries:
https://github.com/apache/hadoop/blob/123d1aa8846a2099b04fd8e8ed7f2bd6db4f36b5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java#L364-L366
And the use case here seems to be just getting the ``readCount``, which
actually doesn't require ``OzoneFileStatus``, it can easily be looped using the
``LISTING_PAGE_SIZE`` or similar numbers, If you go at a scale of around a ~1
Million entries or beyond you would be utilising a lot of memory storing the
``FileStatus`` which you actually don't need, If I remember correct it started
taking around ~1.5 Gigs/million, when I tried similar stuff with HDFSFileStatus
in DistCp which sadly builds up the FileStatus tree in Breadth First Traversal
approach. It would be less for Ozone for sure as compared to HDFS....
I couldn't spare time to dig in much, Please feel free to proceed if you are
convinced.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]