Doug Cutting wrote:
Paul Sheer wrote:
I have the requirement to use Hadoop with case-insensitivity and
case-preservation ala Windows.

I think you may have difficultly convincing folks that Hadoop should directly support this mode of operation, and it's also a bad idea to run a hacked version of HDFS, since that will be hard to maintain.

The safest and simplest way to support this might be to layer it on top of the standard API. You can implement a FilterFileSystem that, when opening files or listing directories, uses case-insensitive comparisons. So, to open "/foo/bar" you'd first list "/" looking for subdirectories which case-insensitively match "foo", then, if one is found, list it looking for a file which case-insensitively matches "bar". Could this suffice?

Doug

full windows case-logic is pretty bizarre, as you need to ignore case all file operations ;mv lower LOWER would result in a file called "lower" because of the rule that if there is a destination file whose case-insensitive name matches that of the target file, it becomes the destination name.
Other issues:
- it should be impossible to create two files in the same directory with the same case-insensitive name. - you need to take locale into account when comparing case. Turkey is the testcase, as "I".toLower()!="i"; it's the place where you get the bugreps when your logic is broken.

I would stay very clear of it.

Reply via email to