[
https://issues.apache.org/jira/browse/HDFS-13759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593164#comment-16593164
]
wuchang commented on HDFS-13759:
--------------------------------
[~xkrogen]
Great thanks for you precious response. I make a deep insight into client-side
and server-side source code to check if current HDFS could provide the
pagination requirement as I expected. I believe the pagination feature is
necessary for many HDFS api users.
For *offset* feature:
I find that by *DFSClient#listPaths,*
{code:java}
public DirectoryListing listPaths(String src, byte[] startAfter)
throws IOException {
return listPaths(src, startAfter, false);
}{code}
we can see that we should have to provide a *startAfter* parameter, which
specify the first path string, instead of the offset number, so this is not a
pagination we want.
For *limit* feature:
According to the source code, I found that
{code:java}
dfs.list.limit {code}
is a NameNode side configuration item instead of a item which could be set by
client side or configured by client-side *hdfs-site.xml*. So it cannot be
controlled or setup by client.
So, according to my understanding, should we make below improvement to make
HDFS support a true pagination?
1. In client side, add a interface method like below:
{code:java}
public DirectoryListing listPaths(String src, int offset) throws IOException ;
public DirectoryListing listPaths(String src, int offset, int limit)
throws IOException ;
public DirectoryListing listPaths(String src, int offset, int limit, boolean
needLocation);{code}
When limit is not specified, the limit will be the default value configured by
{code:java}
dfs.list.limit{code}
2. we should correspondingly add a protobuf interface to support this method
like below:
{code:java}
# File: ClientNameNodeProtocol.proto
message GetListingRequestProto {
required string src = 1;
required bytes offset = 2;
required bool limit = 3;
required bool needLocation = 4;
}
{code}
3. In NameNode Server side, we should add a method to support the pagination,
including ClientNameNodeProtocolServerSideTranslatorPB.java,
NameNodePpcServer.java and so on.
> [HDFS Pagination]Does HDFS Java api Support Pagination?
> -------------------------------------------------------
>
> Key: HDFS-13759
> URL: https://issues.apache.org/jira/browse/HDFS-13759
> Project: Hadoop HDFS
> Issue Type: Wish
> Components: fs, fs async
> Affects Versions: 2.6.0, 2.7.0, 2.8.0
> Reporter: wuchang
> Priority: Major
> Labels: HDFS, pagination
>
> I could use *FileSystem*
> {code:java}
> RemoteIterator<FileStatus> listed = fs.listStatusIterator(new
> Path("hdfs://warehousestore/user/chang.wu/flat_level_1"));{code}
> like this to get files *asynchronously*.
> But in fact what I want is a pagination support, where I could pass two
> parameters, the
> {code:java}
> offset{code}
> and
> {code:java}
> limit{code}
> , like MySQL does to get part of files under some directory;.
> I know I could just implement the pagination by wrapping the
> *listStatusIterator*, but I think it is inefficient by wrapping this iterator.
> I wonder why cannot HDFS java Api support that?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]