Distributed Cache

zhangguoping zhangguoping Tue, 06 Jul 2010 01:54:20 -0700

>From the book: "Hadoop The definitive guide" -- P242
>>
When you launch a job, Hadoop copies the files specified by the -files and
-archives options to the jobtracker’s filesystem (normally HDFS). Then,
before a task
is run, the tasktracker copies the files from the jobtracker’s filesystem to
a local disk—
the cache—so the task can access the files.
>>


I wonder why hadoop wants to copy the files to jobtracker's filesystem.
Since it is already in HDFS, it should be available to tasks.
Any considerations?

Distributed Cache

Reply via email to