Propose some changes to FileContext
-----------------------------------

                 Key: HADOOP-6678
                 URL: https://issues.apache.org/jira/browse/HADOOP-6678
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs
            Reporter: Hairong Kuang
             Fix For: 0.21.0, 0.22.0


# Add a method  Iterator<FileStatus> listStatus(Path), which allows HDFS client 
not to have the whole listing in the memory, benefit more from the iterative 
listing added in HDFS-985. Move the current FileStatus[] listStatus(Path) to be 
a utility method.
# Remove methods isFile(Path), isDirectory(Path), and exists.
All these methods are implemented by calling getFileStatus(Path).But most users 
are not aware of this. They would write code as below: 
{code}
  FileContext fc = ..;
  if (fc.exists(path)) {
    if (fc.isFile(path)) {
     ...
    } else {
    ...
    }
  }
{code}
The above code adds unnecessary getFileInfo RPC to NameNode. In our production 
clusters, we often see that the number of getFileStatus calls is multiple times 
of the open calls. If we remove isFile, isDirectory, and exists from 
FileContext, users have to explicitly call getFileStatus first, it is more 
likely that they will write more efficient code as follow:
{code}
  FileContext fc = ...;
  FileStatus fstatus = fc.getFileStatus(path);
  if (fstatus.isFile() {
    ...
  } else {
    ...
  }
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to