RE: What's the best way to get to a single key?

Xavier Stevens Tue, 11 Mar 2008 11:19:07 -0700

Disreguard.  I figured this one out.  It was an error caused by calling 

MapFile.Reader[] readers = MapFileOutputFormat.getReaders(fileSys,
outDir, defaults);

With the wrong path for outDir.

Just in case anyone wants an example to do this later on.  I also had to
pass a non-null value to:

Text myEntry = new Text();
MapFileOutputFormat.getEntry(readers, part, new Text("mykey"), myEntry);

This method's Javadocs should be updated to make things a bit more
clear.  It both fills out the value object passed in as well as
returning it.  Or better yet change the method.  Unless I am missing
something I don't see why you should have to pass in a value at all,
since we really want to retrieve by key.

Cheers,

-Xavier

-----Original Message-----
From: Xavier Stevens
Sent: Monday, March 10, 2008 5:09 PM
To: [email protected]
Subject: RE: What's the best way to get to a single key?

So I read some more through the Javadocs.  I had 11 reducers on my
original job leaving me 11 MapFile directories.  I am passing in their
parent directory here as "outDir".

MapFile.Reader[] readers = MapFileOutputFormat.getReaders(fileSys,
outDir, defaults); Partitioner part =
(Partitioner)ReflectionUtils.newInstance(conf.getPartitionerClass(),
conf); Text entryValue = (Text)MapFileOutputFormat.getEntry(readers,
part, new Text("mykey"), null); System.out.println("My Entry's Value:
"); System.out.println(entryValue.toString());

But I am getting an exception:

Exception in thread "main" java.lang.ArithmeticException: / by zero
        at
org.apache.hadoop.mapred.lib.HashPartitioner.getPartition(HashPartitione
r.java:35)
        at
org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputForma
t.java:85)
        at mypackage.MyClass.main(MyClass.java:110)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

I am assuming I am doing something wrong, but I'm not sure what it is
yet.  Any ideas?

-Xavier

-----Original Message-----
From: Xavier Stevens
Sent: Mon 3/10/2008 3:49 PM
To: [email protected]
Subject: RE: What's the best way to get to a single key?

I was thinking because it would be easier to search a single-index.
Unless I don't have to worry and hadoop searches all my indexes at the
same time.  Is this the case?

-Xavier

-----Original Message-----
From: Doug Cutting
Sent: Monday, March 10, 2008 3:45 PM
To: [email protected]
Subject: Re: What's the best way to get to a single key?

Xavier Stevens wrote:
> Thanks for everything so far.  It has been really helpful.  I have one

> more question.  Is there a way to merge MapFile index/data files?

No.

To append text files you can use 'bin/hadoop fs -getmerge'.

To merge sorted SequenceFiles (like MapFile/index files) you can use:

http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/Sequ
enceFile.Sorter.html#merge(org.apache.hadoop.fs.Path[], org.apache.had
oop.fs.Path, boolean)

But this doesn't generate a MapFile.

Why is a single file preferable?

Doug

RE: What's the best way to get to a single key?

Reply via email to