Sean Shanny wrote:
To all,
Version: hadoop-0.17.2.1-core.jar
I have created a MapFile.
What I don't seem to be able to do is correctly place the MapFile in
the DistributedCache and the make use of it in a map method.
I need the following info please:
1. How and where to place the MapFile directory so that it is
visible to the hadoop job.
You have to place your files in DFS. If it is directory you can place an
archive of it.
2. How to add the files to the DistributedCache.
You can use DistributedCache.addCacheFile or
DistributedCache.addCacheArchive.
See more documentation @
http://hadoop.apache.org/core/docs/r0.17.2/api/org/apache/hadoop/filecache/DistributedCache.html
and
http://hadoop.apache.org/core/docs/r0.17.2/mapred_tutorial.html#DistributedCache
3. How to create a MapFile.Reader from files in the DistributedCache.
I didn't understand what you want to do here. Do you want see the files
in directory MapFile? or do you want them in classpath etc?
You can use DistributedCache.addFileToClassPath or
DistributedCache.addArchiveToClassPath
Hope this helps.
Thanks
Amareshwari
I can get this to work with a local file on a single node system
outside of the DistributedCache but for the life of me cannot get it
to work within a DistributedCache.
We are trying to load up key value mappings for a Data Warehouse ETL
process. The mapper will take an input record, lookup the keys based
on values and emit the resulting key only record.
Happy to answer any questions to help me make this work.
Thanks.
--sean