Christopher Tubbs created HADOOP-19816:
------------------------------------------

             Summary: Hadoop should ship with a URLStreamHandlerProvider to 
handle hdfs: URLs
                 Key: HADOOP-19816
                 URL: https://issues.apache.org/jira/browse/HADOOP-19816
             Project: Hadoop Common
          Issue Type: Wish
            Reporter: Christopher Tubbs


Much of Hadoop's APIs seem to be centered around the use of a URI. However, 
`hdfs:` URIs are actually locations (URLs) and can be treated as such. 
Unfortunately, Hadoop doesn't ship with a URLStreamHandlerProvider, so 
applications (including the Hadoop client API itself) cannot use these URLs 
directly, because there's no provider in the system capable of interpreting 
these URIs as URLs, and `hdfs:` is not a built-in URL scheme in Java.

The Apache Accumulo project wrote a simple implementation to handle URLs with 
the `hdfs:` scheme (it does not handle other HDFS schemes, like `viewfs:`, 
etc., which may be nice to support, though `file:` URLs that Hadoop also 
supports, are probably best left handled by the built-in 
URLStreamHandlerProvider in Java, rather than Hadoop's LocalFileSystem or 
RawLocalFileSystem implementations).

I think this would be homed better inside the Hadoop project, rather than left 
as an exercise to the user. For reference, our implementation is located at:

[https://github.com/apache/accumulo-classloaders/blob/21626c06236d6a96ed0c53ef7fc9c38c3d449a9a/modules/hdfs-urlstreamhandler-provider/README.md]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to