Re: Hadoop with case-preservation and case-insensitivity

Steve Loughran Mon, 09 Mar 2009 06:49:13 -0700

Doug Cutting wrote:

Paul Sheer wrote:
I have the requirement to use Hadoop with case-insensitivity and
case-preservation ala Windows.
I think you may have difficultly convincing folks that Hadoop shoulddirectly support this mode of operation, and it's also a bad idea to runa hacked version of HDFS, since that will be hard to maintain.
The safest and simplest way to support this might be to layer it on topof the standard API. You can implement a FilterFileSystem that, whenopening files or listing directories, uses case-insensitive comparisons.So, to open "/foo/bar" you'd first list "/" looking for subdirectorieswhich case-insensitively match "foo", then, if one is found, list itlooking for a file which case-insensitively matches "bar". Could thissuffice?
Doug

full windows case-logic is pretty bizarre, as you need to ignore caseall file operations ;mv lower LOWER would result in a file called"lower" because of the rule that if there is a destination file whosecase-insensitive name matches that of the target file, it becomes thedestination name.

Other issues:

- it should be impossible to create two files in the same directory withthe same case-insensitive name.- you need to take locale into account when comparing case. Turkey isthe testcase, as "I".toLower()!="i"; it's the place where you get thebugreps when your logic is broken.


I would stay very clear of it.

Re: Hadoop with case-preservation and case-insensitivity

Reply via email to