[ 
https://issues.apache.org/jira/browse/HADOOP-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511280
 ] 

Doug Cutting commented on HADOOP-1568:
--------------------------------------

> under our proposal the planner does 1 http get. Under your proposal, the 
> planner does 10 million (or 40 million without caching) serial http head 
> operations

What is your proposal?  What is the task?  I thought it was listing a directory 
to be copied.  That would take one HEAD per file in the directory.  Are you 
copying a directory with 10M files?  Why are you multiplying by 4?  Each file 
only needs to be stat'd once.  It actually doesn't even need that if we're 
willing to forgo sorting by length.  So it could just use a single GET with 
HTML too--just list the names to be copied.  Recursive listings would take 
caching isDir in the path, but could still be reduced to a single GET per dir.

> NameNode Schema for HttpFileSystem
> ----------------------------------
>
>                 Key: HADOOP-1568
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1568
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>         Attachments: ls-xml.patch
>
>
> This issue will track the design and implementation of (the first pass of) a 
> servlet on the namenode for querying its filesystem via HTTP. The proposed 
> syntax for queries and responses is as follows.
> *Query*
> {noformat}GET http://<nn>:<port>/ls.jsp[<?option>[&option]*] 
> HTTP/1.1{noformat}
> Where _option_ may be any of the following query parameters:
> _path_ : String (default: '/')
> _recursive_ : boolean (default: false)
> _filter_ : String (default: none)
> *Response*
> The response will be returned as an XML document in the following format:
> {noformat}
> <listing path="..." recursive="(yes|no)" filter="..."
>          time="yyyy-MM-dd hh:mm:ss UTC" version="...">
>   <directory path="..."/>
>   <file path="..." modified="yyyy-MM-dd hh:mm:ss" blocksize="..."
>         replication="..." size="..."
>         dnurl="http://dn:port/streamFile?..."/>
> </listing>
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to