Re: Working with MapFiles

Ondřej Klimpera Fri, 30 Mar 2012 03:08:22 -0700

Hello, I've got one more question, how is seek() (or get()) methodimplemented in MapFile.Reader, does it use hashCode, compareTo() oranother mechanism to find a match in MapFile's index.


Thanks for your reply.
Ondrej Klimpera


On 03/29/2012 08:26 PM, Ondřej Klimpera wrote:

Thanks for your fast reply, I'll try this approach:)

On 03/29/2012 05:43 PM, Deniz Demir wrote:
Not sure if this helps in your use case but you can put all outputfile into distributed cache and then access them in the subsequentmap-reduce job (in driver code):
    // previous mr-job's output
    String pstr = "hdfs://<output_path/";
    FileStatus[] files = fs.listStatus(new Path(pstr));
    for (FileStatus f : files) {
        if (!f.isDir()) {
DistributedCache.addCacheFile(f.getPath().toUri(),job.getConfiguration());
        }
    }
I think you can also copy these files to a different location in dfsand then put into distributed cache.
Deniz


On Mar 29, 2012, at 8:05 AM, Ondřej Klimpera wrote:
Hello,
I have a MapFile as a product of MapReduce job, and what I need todo is:
1. If MapReduce produced more spilts as Output, merge them to singlefile.
2. Copy this merged MapFile to another HDFS location and use it as aDistributed cache file for another MapReduce job.
I'm wondering if it is even possible to merge MapFiles according totheir nature and use them as Distributed cache file.
What I'm trying to achieve is repeatedly fast search in this fileduring another MapReduce job.
If my idea is absolute wrong, can you give me any tip how to do it?

The file is supposed to be 20MB large.
I'm using Hadoop 0.20.203.

Thanks for your reply:)

Ondrej Klimpera

Re: Working with MapFiles

Reply via email to