Is there a way to do this when your input data is using SequenceFile
compression?

Thanks,

-Xavier 

-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 03, 2008 2:52 PM
To: [email protected]
Subject: Re: What's the best way to get to a single key?

Use MapFileOutputFormat to write your data, then call:

http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/
MapFileOutputFormat.html#getEntry(org.apache.hadoop.io.MapFile.Reader[],
%20org.apache.hadoop.mapred.Partitioner,%20K,%20V)

The documentation is pretty sparse, but the intent is that you open a
MapFile.Reader for each mapreduce output, pass the partitioner used, the
key, and the value to be read into.

A MapFile maintains an index of keys, so the entire file need not be
scanned.  If you really only need the value of a single key then you
might avoid opening all of the output files.  In that case you could
might use the Partitioner and the MapFile API directly.

Doug


Xavier Stevens wrote:
> I am curious how others might be solving this problem.  I want to 
> retrieve a record from HDFS based on its key.  Are there any methods 
> that can shortcut this type of search to avoid parsing all data until 
> you find it?  Obviously Hbase would do this as well, but I wanted to 
> know if there is a way to do it using just Map/Reduce and HDFS.
> 
> Thanks,
> 
> -Xavier
> 



Reply via email to