Re: where distributed cache start working

Arun C Murthy Sun, 22 Aug 2010 18:38:48 -0700

Moving to mapreduce-user@, bcc common-...@. Please use the projectspecific lists.

DistributedCache.purgeCache isn't a public api. You shouldn't becalling it from the task.

A simple way of doing what you want is to change the mtime of thecache files on HDFS.


Arun

On Aug 22, 2010, at 9:48 AM, Gang Luo wrote:

Thanks Jeff.
However, are you sure TaskRunner.run() is also used in the new API?I use btraceto trace the function call but didn't find this function had beencalled
anywhere.


One more question about distributed cache. After I call
DistributedCache.purgeCache, I think the local cached files shouldbe deleted orinvalidated. However ,When I run the same job with the purgeoperation at theend multiple times, I find the local files have never been deletedand the
modification time is when the first job run. How can I ask my job to
re-distributed the cache again anyway?

Thanks,
-Gang




----- 原始邮件 ----
发件人： Jeff Zhang <zjf...@gmail.com>
收件人： common-dev@hadoop.apache.org
发送日期： 2010/8/20 (周五) 11:22:49 上午
主   题： Re: where distributed cache start working

Hi Gang,

In the TaskRunner's run() method, hadoop will download the cache files
which you set on the client side to local, then the forked child jvm
can use these cache files locally.
On Fri, Aug 20, 2010 at 8:08 AM, Gang Luo <lgpub...@yahoo.com.cn>wrote:
Hi all,
I go through the code, but couldn't find the place wheredistributed cache
start
working. I want to know between DistriubtedCache.addCacheFile atthe master
node
and DistributedCache.getLocalCacheFiles at the client side, whenand where are
the files get distributed.


Thanks,
-Gang
--
Best Regards

Jeff Zhang

Re: where distributed cache start working

Reply via email to