Derek Young wrote:
Reading http://issues.apache.org/jira/browse/HADOOP-341 it sounds like
this should be supported, but the http URLs are not working for me. Are
http source URLs still supported?
No. They used to be supported, but when distcp was converted to accept
any Path this stopped working, since there is no FileSystem
implementation mapped to http: paths. Implementing an HttpFileSystem
that supports read-only access to files and no directory listings is
fairly trivial, but without directory listings, distcp would not work well.
https://issues.apache.org/jira/browse/HADOOP-1563 includes a now
long-stale patch that implements an HTTP filesystem, where directory
listings are implemented, assuming that:
- directories are represented by slash-terminated urls;
- GET of a directory contains the URLs of its children
This works for the directory listings returned by many HTTP servers.
Perhaps someone can update this patch, and, if folks find it useful, we
can include it.
Doug