That seems interesting, we have 3 replications as default. Is there a way to define, lets say, 1 replication for only job-specific files?
2009/4/2 Owen O'Malley <[email protected]>: > > On Apr 2, 2009, at 2:41 AM, andy2005cst wrote: > >> >> I need to use the output of the reduce, but I don't know how to do. >> use the wordcount program as an example if i want to collect the wordcount >> into a hashtable for further use, how can i do? > > You can use an output format and then an input format that uses a database, > but in practice, the cost of writing to hdfs and reading it back is not a > problem, especially if you set the replication of the output files to 1. > (You'll need to re-run the job if you lose a node, but it will be fast.) > > -- Owen > -- M. Raşit ÖZDAŞ
