Hi Mohit, > In this scenario is data also replicated as defined by the replication factor > to other nodes as well? I am wondering if at this point if crash occurs do I > have data in other nodes?
What kind of crash are you talking about here? A client crash or a cluster crash? If a cluster, is the loss you're thinking of one DN or all the replicating DNs? If client fails to close a file due to a crash, it is auto-closed later (default is one hour) by the NameNode and whatever the client successfully wrote (i.e. into its last block) is then made available to readers at that point. If the client synced, then its last sync point is always available to readers and whatever it didn't sync is made available when the file is closed later by the NN. For DN failures, read on. Replication in 1.x/0.20.x is done via pipelines. Its done regardless of sync() calls. All write packets are indeed sent to and acknowledged by each DN in the constructed pipeline as the write progresses. For a good diagram on the sequence here, see Figure 3.3 | Page 66 | Chapter 3: The Hadoop Distributed Filesystem, in Tom's "Hadoop: The Definitive Guide" (2nd ed. page nos. Gotta get 3rd ed. soon :)) The sync behavior is further explained under the 'Coherency Model' title at Page 68 | Chapter 3: The Hadoop Distributed Filesystem of the same book. Think of sync() more as a checkpoint done over the write pipeline, such that new readers can read the length of synced bytes immediately and that they are guaranteed to be outside of the DN application (JVM) buffers (i.e. flushed). Some further notes, for general info: In 0.20.x/1.x releases, there's no hard-guarantee that the write buffer flushing done via sync ensures the data went to the *disk*. It may remain in the OS buffers (a feature in OSes, for performance). This is cause we do not do an fsync() (i.e. calling force on the FileChannel for the block and metadata outputs), but rather just an output stream flush. In the future, via 2.0.1-alpha release (soon to come at this point) and onwards, the specific call hsync() will ensure that this is not the case. However, if you are OK with the OS buffers feature/caveat and primarily need syncing not for reliability but for readers, you may use the call hflush() and save on performance. One place where hsync() is to be preferred instead of hflush() is where you use WALs (for data reliability), and HBase is one such application. With hsync(), HBase can survive potential failures caused by major power failure cases (among others). Let us know if this clears it up for you! On Sat, Jun 9, 2012 at 4:58 AM, Mohit Anchlia <mohitanch...@gmail.com> wrote: > I am wondering the role of sync in replication of data to other nodes. Say > client writes a line to a file in Hadoop, at this point file handle is open > and sync has not been called. In this scenario is data also replicated as > defined by the replication factor to other nodes as well? I am wondering if > at this point if crash occurs do I have data in other nodes? -- Harsh J