Thanks, Harsh J. Your answer is quite helpful! If I understand right, writes wait until all replicas are created if there is no error during the replication process. If there is any error in the replication pipeline, dfs.replication.min comes into play . Is my understanding correct?
Gerald On Thu, Jan 26, 2012 at 4:07 PM, Harsh J <ha...@cloudera.com> wrote: > Hi, > > On Fri, Jan 27, 2012 at 12:27 AM, Zhenhua (Gerald) Guo <jen...@gmail.com> > wrote: >> I have two questions regarding creation of replicas. >> - When a user uploads a file to HDFS, it returns whenever the first >> replica is created? or the client needs wait until all replicas are >> created? >> - When the output of MapReduce jobs is written to HDFS (by reduce >> tasks), the writing of output returns when the first replica is >> created? or wait until all replicas are created? > > Both questions are the same as both do the same form of DFS write. > > Writes are synchronous and replication is pipelined, presently in Apache > Hadoop. > > But a write will succeed if at least 1 replica was written (controlled > via dfs.replication.min -- pipeline can lose DNs out of errors, or can > get fewer than requested DNs cause of load/space issues, but write > will succeed if it at least gets one DN) > > Also see the whole conversation at > http://search-hadoop.com/m/bF99W1ZmNqz1 for some more tidbits you > might find interesting. > > -- > Harsh J > Customer Ops. Engineer, Cloudera