On Nov 5, 2008, at 2:21 PM, Tarandeep Singh wrote:
I want to know whether the key,values received by a particular
reducer at a
node are stored locally on that node or are stored on DFS (and hence
replicated over cluster according to replication factor set by user)
Map outputs (and reduce inputs) are stored on local disk and not HDFS.
The data is moved between computers via http.
One more question- How does framework replicates the data? Say Node A
writes a file, is it guaranteed that atleast one copy will be stored
on node
A?
HDFS writes all of the replicas in parallel. For the most part, it
writes to the local (same node) DataNode first, that DataNode sends it
to a DataNode on another rack, and that DataNode sends it to a third
DataNode on the other rack.
-- Owen