[ 
https://issues.apache.org/jira/browse/HDFS-13759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593164#comment-16593164
 ] 

wuchang commented on HDFS-13759:
--------------------------------

[~xkrogen]

Great thanks for you precious response. I make a deep insight into client-side 
and server-side source code to check if current HDFS could provide the 
pagination requirement as I expected. I believe the pagination feature is 
necessary for many HDFS api users.

For *offset* feature:
 I find that by *DFSClient#listPaths,* 
{code:java}
public DirectoryListing listPaths(String src, byte[] startAfter)
 throws IOException {
 return listPaths(src, startAfter, false);
}{code}
 we can see that we should have to provide a *startAfter* parameter, which 
specify the first path string, instead of the offset number, so this is not a 
pagination we want.

 

For *limit* feature:
According to the source code, I found that
{code:java}
dfs.list.limit {code}
is a NameNode side configuration item instead of a item which could be set by 
client side or configured by client-side *hdfs-site.xml*.  So it cannot be 
controlled or setup by client.

So, according to my understanding, should we make below improvement to make 
HDFS support a true pagination?

1.  In client side, add a interface method like below:
{code:java}
public DirectoryListing listPaths(String src, int offset) throws IOException ;
public DirectoryListing listPaths(String src, int offset, int limit)
 throws IOException ;
public DirectoryListing listPaths(String src, int offset, int limit, boolean 
needLocation);{code}
When limit is not specified, the limit will be the default value configured by 
{code:java}
dfs.list.limit{code}

2. we should correspondingly add a protobuf interface to support this method 
like below:

 
{code:java}
# File: ClientNameNodeProtocol.proto
message GetListingRequestProto {
 required string src = 1;
 required bytes offset = 2;
required bool limit = 3;
 required bool needLocation = 4;
}
{code}
3. In NameNode Server side, we should add a method to support the pagination, 
including ClientNameNodeProtocolServerSideTranslatorPB.java, 
NameNodePpcServer.java and so on.

 

> [HDFS Pagination]Does HDFS Java api Support Pagination?
> -------------------------------------------------------
>
>                 Key: HDFS-13759
>                 URL: https://issues.apache.org/jira/browse/HDFS-13759
>             Project: Hadoop HDFS
>          Issue Type: Wish
>          Components: fs, fs async
>    Affects Versions: 2.6.0, 2.7.0, 2.8.0
>            Reporter: wuchang
>            Priority: Major
>              Labels: HDFS, pagination
>
> I could use *FileSystem*
> {code:java}
> RemoteIterator<FileStatus> listed = fs.listStatusIterator(new 
> Path("hdfs://warehousestore/user/chang.wu/flat_level_1"));{code}
> like this to get files *asynchronously*.
> But in fact what I want is a pagination support, where I could pass two 
> parameters, the 
> {code:java}
> offset{code}
> and
> {code:java}
> limit{code}
> , like MySQL does to get part of files under some directory;.
> I know I could just implement the pagination by wrapping the 
> *listStatusIterator*, but I think it is inefficient by wrapping this iterator.
> I wonder why cannot HDFS java Api support that?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to