I'm all for the change!
Of course we don't have a nutch base to upgrade.
On Apr 11, 2006, at 11:23 AM, Doug Cutting (JIRA) wrote:
[ http://issues.apache.org/jira/browse/HADOOP-129?
page=comments#action_12374087 ]
Doug Cutting commented on HADOOP-129:
-------------------------------------
URI actually *can* compute parent directory. For example:
URI subDir = new URI("/foo/bar/baz/");
URI parent = subDir.resolve("..");
Parent.toString() returns "/foo/bar/".
So I think that URI has the features we want for filenames and not
much else. Am I missing something?
It might also be useful to implement a URLStreamHandler, so that
one can create "hdfs:" urls and use them whereever java accepts
URLs, e.g., in classloaders, etc. But the URL class doesn't
support relative path name resolution, the primary feature we
require for names.
Unless there are objections, I'll start exploring replacing the
uses of java.io.File with java.net.URI.
My thinking is that we remove rather than deprecate the old
methods. This makes the change incompatible, but I think we really
want to get rid of the use of java.io.File. I'm willing to update
Nutch & unit tests as required, but this may break others' code.
Should we instead deprecate these in Hadoop 0.2 and then remove
them in 0.3? Thoughts?
FileSystem should not name files with java.io.File
--------------------------------------------------
Key: HADOOP-129
URL: http://issues.apache.org/jira/browse/HADOOP-129
Project: Hadoop
Type: Improvement
Components: fs
Versions: 0.1.1, 0.1.0
Reporter: Doug Cutting
Fix For: 0.2
In Hadoop's FileSystem API, files are currently named using
java.io.File. This is confusing, as many methods on that class
are inappropriate to call on Hadoop paths. For example, calling
isDirectory(), exists(), etc. on a java.io.File is not the same as
calling FileSystem.isDirectory() or FileSystem.exists() passing
that same file. Using java.io.File also makes correct operation
on Windows difficult, since java.io.File operates differently on
Windows in order to accomodate Windows path names. For example,
new File("/foo") is not absolute on Windows, and prints its path
as "\\foo", which causes confusion.
To fix this we could replace the uses of java.io.File in the
FileSystem API with String, a new FileName class, or perhaps
java.net.URI. The advantage of URI is that it can also naturally
include the namenode host and port. The disadvantage is that URI
does not support tree operations like getParent().
This change will cause a lot of incompatibility. Thus it should
probably be made early in a development cycle in order to maximize
the time for folks to adapt to it.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira