Hello,
I am a bit confused about the local directories where each map/reduce task
can store data.
According to what I have read,
dfs.data.dir - is the path on the local file system in which the DataNode
instance should store its data. That is, since we have a number of
individual nodes, this is the place where each node can store its own data.
Right?
This data may be part of a-let's say- file stored under the hdfs namespace?
The value of this property for my configuration is:
/home/bon/my_hdfiles/temp_0.19.1/dfs/data.
As far as I can understand this path refers to the local "disk" of each
node.
Moreover, calling FileOutputFormat.getWorkOutputPath(job) we obtain the Path
to the task's temporary output directory for the map-reduce job. This path
is totally different than the previous which confuses me since the temporary
output of each task should be written locally in the node's disk. The path I
retrieve is:
hdfs://localhost:9000/user/bon/keys_fil.txt/_temporary/_attempt_200907011515_0009_m_000000_0
Does this path refer to the local disk (node)? Or is it possible that it may
refer to another node in the cluster?
Any clarification would be of great help.
Thank you.
--
View this message in context:
http://www.nabble.com/local-directory-tp24292289p24292289.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.