Is there a way to do this when your input data is using SequenceFile compression?
Thanks, -Xavier -----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Monday, March 03, 2008 2:52 PM To: [email protected] Subject: Re: What's the best way to get to a single key? Use MapFileOutputFormat to write your data, then call: http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/ MapFileOutputFormat.html#getEntry(org.apache.hadoop.io.MapFile.Reader[], %20org.apache.hadoop.mapred.Partitioner,%20K,%20V) The documentation is pretty sparse, but the intent is that you open a MapFile.Reader for each mapreduce output, pass the partitioner used, the key, and the value to be read into. A MapFile maintains an index of keys, so the entire file need not be scanned. If you really only need the value of a single key then you might avoid opening all of the output files. In that case you could might use the Partitioner and the MapFile API directly. Doug Xavier Stevens wrote: > I am curious how others might be solving this problem. I want to > retrieve a record from HDFS based on its key. Are there any methods > that can shortcut this type of search to avoid parsing all data until > you find it? Obviously Hbase would do this as well, but I wanted to > know if there is a way to do it using just Map/Reduce and HDFS. > > Thanks, > > -Xavier >
