fanlinqian commented on PR #1725:
URL: https://github.com/apache/hadoop/pull/1725#issuecomment-1336370086
Hello, I encountered a bug when using the batch method, when I input a
directory with more than 1000 files in it and 2 replications of each file's
data block, only the first 500 files of this directory are returned and then it
stops. I think it should be
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
in getBatchedListing() method to modify, as follows.
for (; srcsIndex < srcs.length; srcsIndex++) {
String src = srcs[srcsIndex];
HdfsPartialListing listing;
try {
DirectoryListing dirListing = getListingInt(dir, pc,
src, indexStartAfter, needLocation);
if (dirListing == null) {
throw new FileNotFoundException("Path " + src + "
does not exist");}
listing = new HdfsPartialListing(srcsIndex,
Lists.newArrayList(dirListing.getPartialListing()));
numEntries += listing.getPartialListing().size();
lastListing = dirListing;
} catch (Exception e) {
if (e instanceof AccessControlException) {
logAuditEvent(false, operationName, src);}
listing = new HdfsPartialListing(srcsIndex,
new
RemoteException(e.getClass().getCanonicalName(), e.getMessage()));
lastListing = null;
LOG.info("Exception listing src {}", src, e);}
listings.put(srcsIndex, listing);
//My modification
(lastListing.getRemainingEntries()!=0)
{
break;
}
if (indexStartAfter.length != 0)
{
indexStartAfter = new byte[0];
}
// Terminate if we've reached the maximum listing size
if (numEntries >= dir.getListLimit()) {
break;
}
}
The reason for this bug is mainly that the result returned by the
getListingInt(dir, pc, src, indexStartAfter, needLocation) method will limit
both the number of files in the directory as well as the number of data blocks
and replications of the files at the same time. But the getBatchedListing()
method will only exit the loop if the number of returned results is greater
than 1000.
Looking forward to your reply
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]