Re: Working with MapFiles

Ioan Eugen Stan Mon, 02 Apr 2012 04:00:47 -0700

Hi Ondrej,

Pe 02.04.2012 13:00, Ondřej Klimpera a scris:

Ok, thanks.


I missed setup() method because of using older version of hadoop, so I
suppose that method configure() does the same in hadoop 0.20.203.

Aha, if it's possible, try upgrading. I don't know how support is forversions older then hadoop 0.20 branch.

Now I'm able to load a map file inside configure() method to
MapFile.Reader instance as a class private variable, all works fine,
just wondering if the MapFile is replicated on HDFS and data are read
locally, or if reading from this file will increase the network
bandwidth because of getting it's data from another computer node in the
hadoop cluster.

You could use a method variable instead of a class private if you loadthe file. If the MapFile is wrote to HDFS then yes it is replicated, andyou can configure the replication factor at file creation (and latermaybe). If you use DistributedCache then the files are not written inHDFS, but in mapred.local.dir [1] folder on every node.The folder size is configurable so it's possible that the data will beavailable there for the next MR job but don't rely on this.


Please read the docs, I may get things wrong. RTFM will save you life ;).

[1] http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata
[2] https://forums.aws.amazon.com/message.jspa?messageID=152538

Hopefully last question to bother you is, if reading files from
DistributedCache (normal text file) is limited to particular job.
Before running a job I add a file to DistCache. When getting the file in
Reducer implementation, can it access DistCache files from another jobs?
In another words what will list this command:

//Reducer impl.
public void configure(JobConf job) {

URI[] distCacheFileUris = DistributedCache.getCacheFiles(job);

}

will the distCacheFileUris variable contain only URIs for this job, or
for any job running on Hadoop cluster?

Hope it's understandable.
Thanks.


It's

--
Ioan Eugen Stan
http://ieugen.blogspot.com

Re: Working with MapFiles

Reply via email to