Disreguard. I figured this one out. It was an error caused by calling
MapFile.Reader[] readers = MapFileOutputFormat.getReaders(fileSys,
outDir, defaults);
With the wrong path for outDir.
Just in case anyone wants an example to do this later on. I also had to
pass a non-null value to:
Text myEntry = new Text();
MapFileOutputFormat.getEntry(readers, part, new Text("mykey"), myEntry);
This method's Javadocs should be updated to make things a bit more
clear. It both fills out the value object passed in as well as
returning it. Or better yet change the method. Unless I am missing
something I don't see why you should have to pass in a value at all,
since we really want to retrieve by key.
Cheers,
-Xavier
-----Original Message-----
From: Xavier Stevens
Sent: Monday, March 10, 2008 5:09 PM
To: [email protected]
Subject: RE: What's the best way to get to a single key?
So I read some more through the Javadocs. I had 11 reducers on my
original job leaving me 11 MapFile directories. I am passing in their
parent directory here as "outDir".
MapFile.Reader[] readers = MapFileOutputFormat.getReaders(fileSys,
outDir, defaults); Partitioner part =
(Partitioner)ReflectionUtils.newInstance(conf.getPartitionerClass(),
conf); Text entryValue = (Text)MapFileOutputFormat.getEntry(readers,
part, new Text("mykey"), null); System.out.println("My Entry's Value:
"); System.out.println(entryValue.toString());
But I am getting an exception:
Exception in thread "main" java.lang.ArithmeticException: / by zero
at
org.apache.hadoop.mapred.lib.HashPartitioner.getPartition(HashPartitione
r.java:35)
at
org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputForma
t.java:85)
at mypackage.MyClass.main(MyClass.java:110)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
I am assuming I am doing something wrong, but I'm not sure what it is
yet. Any ideas?
-Xavier
-----Original Message-----
From: Xavier Stevens
Sent: Mon 3/10/2008 3:49 PM
To: [email protected]
Subject: RE: What's the best way to get to a single key?
I was thinking because it would be easier to search a single-index.
Unless I don't have to worry and hadoop searches all my indexes at the
same time. Is this the case?
-Xavier
-----Original Message-----
From: Doug Cutting
Sent: Monday, March 10, 2008 3:45 PM
To: [email protected]
Subject: Re: What's the best way to get to a single key?
Xavier Stevens wrote:
> Thanks for everything so far. It has been really helpful. I have one
> more question. Is there a way to merge MapFile index/data files?
No.
To append text files you can use 'bin/hadoop fs -getmerge'.
To merge sorted SequenceFiles (like MapFile/index files) you can use:
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/Sequ
enceFile.Sorter.html#merge(org.apache.hadoop.fs.Path[], org.apache.had
oop.fs.Path, boolean)
But this doesn't generate a MapFile.
Why is a single file preferable?
Doug