Moving to mapreduce-user@, bcc common-...@. Please use the project specific lists.

DistributedCache.purgeCache isn't a public api. You shouldn't be calling it from the task.

A simple way of doing what you want is to change the mtime of the cache files on HDFS.

Arun

On Aug 22, 2010, at 9:48 AM, Gang Luo wrote:

Thanks Jeff.

However, are you sure TaskRunner.run() is also used in the new API? I use btrace to trace the function call but didn't find this function had been called
anywhere.


One more question about distributed cache. After I call
DistributedCache.purgeCache, I think the local cached files should be deleted or invalidated. However ,When I run the same job with the purge operation at the end multiple times, I find the local files have never been deleted and the
modification time is when the first job run. How can I ask my job to
re-distributed the cache again anyway?

Thanks,
-Gang




----- 原始邮件 ----
发件人: Jeff Zhang <zjf...@gmail.com>
收件人: common-dev@hadoop.apache.org
发送日期: 2010/8/20 (周五) 11:22:49 上午
主   题: Re: where distributed cache start working

Hi Gang,

In the TaskRunner's run() method, hadoop will download the cache files
which you set on the client side to local, then the forked child jvm
can use these cache files locally.



On Fri, Aug 20, 2010 at 8:08 AM, Gang Luo <lgpub...@yahoo.com.cn> wrote:
Hi all,
I go through the code, but couldn't find the place where distributed cache
start
working. I want to know between DistriubtedCache.addCacheFile at the master
node
and DistributedCache.getLocalCacheFiles at the client side, when and where are
the files get distributed.


Thanks,
-Gang








--
Best Regards

Jeff Zhang





Reply via email to