[ https://issues.apache.org/jira/browse/HDFS-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253955#comment-16253955 ]
Chris Douglas commented on HDFS-12681: -------------------------------------- bq. now it can't distiniguish whether it needs an RPC call, so we need to directly call fs.getFileBlockLocations? v06 of the patch (not v05, sorry mixed them up) would not make an RPC if the {{FileStatus}} included locations: {noformat} diff --git hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java index a8a5cfa..617cbf4 100644 --- hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java +++ hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java @@ -237,6 +236,12 @@ String getPathName(Path file) { if (file == null) { return null; } + if (file instanceof LocatedFileStatus) { + BlockLocation[] loc = ((LocatedFileStatus)file).getBlockLocations(); + if (loc != null) { + return loc; + } + } return getFileBlockLocations(file.getPath(), start, len); } {noformat} This changes the semantics for HDFS (i.e., it won't refresh locations) and the change to MapReduce: {noformat} diff --git hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java index 3e0ea25..0f0a45b 100644 --- hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java +++ hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java @@ -344,11 +344,7 @@ protected FileSplit makeSplit(Path file, long start, long length, if (length != 0) { FileSystem fs = path.getFileSystem(job); BlockLocation[] blkLocations; - if (file instanceof LocatedFileStatus) { - blkLocations = ((LocatedFileStatus) file).getBlockLocations(); - } else { - blkLocations = fs.getFileBlockLocations(file, 0, length); - } + blkLocations = fs.getFileBlockLocations(file, 0, length); {noformat} Would have added additional RPC traffic for non-HDFS {{FileSystem}} implementations that rely on the type to determine if they need locations. {{makeQualified\[Located\]}} are internal methods that allow HDFS to lazily bind {{FileStatus}} fields (improving space efficiency and avoiding some conversions). Clients shouldn't need to call them. We _hope_ that clients would request locations in the first RPC call, rather than asking for a {{FileStatus}} and then requesting its block locations. > Fold HdfsLocatedFileStatus into HdfsFileStatus > ---------------------------------------------- > > Key: HDFS-12681 > URL: https://issues.apache.org/jira/browse/HDFS-12681 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Chris Douglas > Assignee: Chris Douglas > Priority: Minor > Fix For: 3.1.0 > > Attachments: HDFS-12681.00.patch, HDFS-12681.01.patch, > HDFS-12681.02.patch, HDFS-12681.03.patch, HDFS-12681.04.patch, > HDFS-12681.05.patch, HDFS-12681.06.patch, HDFS-12681.07.patch, > HDFS-12681.08.patch, HDFS-12681.09.patch, HDFS-12681.10.patch > > > {{HdfsLocatedFileStatus}} is a subtype of {{HdfsFileStatus}}, but not of > {{LocatedFileStatus}}. Conversion requires copying common fields and shedding > unknown data. It would be cleaner and sufficient for {{HdfsFileStatus}} to > extend {{LocatedFileStatus}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org