Moving to mapreduce-user@, bcc common-...@. Please use the project
specific lists.
DistributedCache.purgeCache isn't a public api. You shouldn't be
calling it from the task.
A simple way of doing what you want is to change the mtime of the
cache files on HDFS.
Arun
On Aug 22, 2010, at 9:48 AM, Gang Luo wrote:
Thanks Jeff.
However, are you sure TaskRunner.run() is also used in the new API?
I use btrace
to trace the function call but didn't find this function had been
called
anywhere.
One more question about distributed cache. After I call
DistributedCache.purgeCache, I think the local cached files should
be deleted or
invalidated. However ,When I run the same job with the purge
operation at the
end multiple times, I find the local files have never been deleted
and the
modification time is when the first job run. How can I ask my job to
re-distributed the cache again anyway?
Thanks,
-Gang
----- 原始邮件 ----
发件人: Jeff Zhang <zjf...@gmail.com>
收件人: common-dev@hadoop.apache.org
发送日期: 2010/8/20 (周五) 11:22:49 上午
主 题: Re: where distributed cache start working
Hi Gang,
In the TaskRunner's run() method, hadoop will download the cache files
which you set on the client side to local, then the forked child jvm
can use these cache files locally.
On Fri, Aug 20, 2010 at 8:08 AM, Gang Luo <lgpub...@yahoo.com.cn>
wrote:
Hi all,
I go through the code, but couldn't find the place where
distributed cache
start
working. I want to know between DistriubtedCache.addCacheFile at
the master
node
and DistributedCache.getLocalCacheFiles at the client side, when
and where are
the files get distributed.
Thanks,
-Gang
--
Best Regards
Jeff Zhang