OK. I am in particular looking at TasteHadoopUtils.readItemIDIndexMap(). Would this ever be fed a composite path like that?
I think you're also suggesting that it never hurts to qualify the path. So, the utility class SequenceFileIterable ought to do this. Well I'd rather err on the side of not breaking things so I'll try not to change behavior here. On Wed, Mar 23, 2011 at 7:24 PM, Sebastian Schelter <[email protected]> wrote: > Those code pieces are from me and they were necessary to make combined > pathes like this work on S3 for me: > > Path combined = new Path(pathA + "," + pathB) > > It's been a quick (and somewhat ugly) workaround, if someone knows a better > solution I'd be happy to see it refactored. > > --sebastian > > > On 23.03.2011 20:17, Sean Owen wrote: >> >> I'm seeing a lot of code that goes out of its way to make a Path in >> Hadoop fully-qualified. It ends up taking a few lines of code. I >> suspect some of it is spurious. I'm trying to confirm my understanding >> of when you would need a fully-qualified path. >> >> This seems to be necessary in general when sending around a Path, or >> storing it, since the a relative path is only partial information and >> is valid only when the context (working directory) is known. Other >> than that... shouldn't be too necessary? >> >> I sort of ask since I look at the following code, and wonder how much >> is necessary? If I stripped it down it looks like... >> >> void foo(String pathString, Configuration conf) { >> Path unqualified = new Path(pathString); >> FileSystem fs = FileSystem.get(unqualified.toUri(), conf); >> Path path = unqualified.makeQualified(fs); >> ... >> new SequenceFile.Reader(fs, new Path(path).makeQualified(fs), conf) ... >> ... >> } >> >> Since I presume SequenceFile.Reader itself makes sense of the path in >> the context of "conf" anyway, all the rest seems redundant. >> Or put another way, I don't see what these acrobatics can add -- >> whatever knowledge is in "conf" is already used deeper down in >> SequenceFile.Reader. >> >> But I recall there's some subtlety with, say, handling s3:// and >> s3n:// URLs here? >> >> Any comments on what's the right thing to do? > >
