Doug Cutting wrote:
Paul Sheer wrote:
I have the requirement to use Hadoop with case-insensitivity and
case-preservation ala Windows.
I think you may have difficultly convincing folks that Hadoop should
directly support this mode of operation, and it's also a bad idea to run
a hacked version of HDFS, since that will be hard to maintain.
The safest and simplest way to support this might be to layer it on top
of the standard API. You can implement a FilterFileSystem that, when
opening files or listing directories, uses case-insensitive comparisons.
So, to open "/foo/bar" you'd first list "/" looking for subdirectories
which case-insensitively match "foo", then, if one is found, list it
looking for a file which case-insensitively matches "bar". Could this
suffice?
Doug
full windows case-logic is pretty bizarre, as you need to ignore case
all file operations ;mv lower LOWER would result in a file called
"lower" because of the rule that if there is a destination file whose
case-insensitive name matches that of the target file, it becomes the
destination name.
Other issues:
- it should be impossible to create two files in the same directory with
the same case-insensitive name.
- you need to take locale into account when comparing case. Turkey is
the testcase, as "I".toLower()!="i"; it's the place where you get the
bugreps when your logic is broken.
I would stay very clear of it.