Local disk is NOT reliable. As I mentioned, a machine can die or a map task can be run multiple times on different machines.
If you want reliable storage, you have to use HDFS. You can write to local disk during the execution of a task, but you need to move the data to HDFS before the task completes if you want it to be permanently stored. The fact that local disk is not reliable is inherent in the way that Hadoop works (and, in fact, inherent in the way large scale computing works). If this seems surprising to you, you need to work through the basic assumptions again. On Wed, Jul 1, 2009 at 1:46 PM, bonito perdo <[email protected]>wrote: > Since i write them locally to each node, how is it possible? What I want is > not losing data. >
