[ http://issues.apache.org/jira/browse/HADOOP-341?page=comments#action_12421686 ] Arun C Murthy commented on HADOOP-341: --------------------------------------
I have a further enhancement to distcp i.e. -f option now works with urls of scheme http/dfs/file. Hence I'm reopening this issue and will submit another patch shortly. Doug, I'll also update logalyzer (HADOOP-342) to reflect these changes and another patch there will be needed too, please hold off commits there. thanks, Arun > Enhance distcp to handle *http* as a 'source protocol'. > ------------------------------------------------------- > > Key: HADOOP-341 > URL: http://issues.apache.org/jira/browse/HADOOP-341 > Project: Hadoop > Issue Type: Improvement > Components: util > Reporter: Arun C Murthy > Fix For: 0.5.0 > > Attachments: distcp.patch, distcp2.patch > > > Requirements: > Presently distcp recursively copies a directory from one dfs to another > i.e. both source and destination of of the *dfs* protocol. > Enhance it to handle *http* as the source protocol i.e. support copying > files from arbitrary http-based sources into the dfs. > Design: > > Follow distcp's current design: one map task per file which needs to be > copied. > Caveat: distcp handles *recursive* copying by listing sub-directories; this > is not as feasible with a http-based source since things like > 'fancy-indexing' might not be enabled on the web-server (for all > sub-locations recursively too), and even if it is enabled it will mean > tedious parsing of the html served to glean the sub-directories etc. Hence > the idea is to support an input file (via a -f option) which contains a list > of the http-based urls which represent multiple source files. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
