>From the book: "Hadoop The definitive guide" -- P242 >> When you launch a job, Hadoop copies the files specified by the -files and -archives options to the jobtracker’s filesystem (normally HDFS). Then, before a task is run, the tasktracker copies the files from the jobtracker’s filesystem to a local disk— the cache—so the task can access the files. >>
I wonder why hadoop wants to copy the files to jobtracker's filesystem. Since it is already in HDFS, it should be available to tasks. Any considerations?
