Hi Mohit,

> In this scenario is data also replicated as defined by the replication factor 
> to other nodes as well? I am wondering if at this point if crash occurs do I 
> have data in other nodes?

What kind of crash are you talking about here? A client crash or a
cluster crash? If a cluster, is the loss you're thinking of one DN or
all the replicating DNs?

If client fails to close a file due to a crash, it is auto-closed
later (default is one hour) by the NameNode and whatever the client
successfully wrote (i.e. into its last block) is then made available
to readers at that point. If the client synced, then its last sync
point is always available to readers and whatever it didn't sync is
made available when the file is closed later by the NN. For DN
failures, read on.

Replication in 1.x/0.20.x is done via pipelines. Its done regardless
of sync() calls. All write packets are indeed sent to and acknowledged
by each DN in the constructed pipeline as the write progresses. For a
good diagram on the sequence here, see Figure 3.3 | Page 66 | Chapter
3: The Hadoop Distributed Filesystem, in Tom's "Hadoop: The Definitive
Guide" (2nd ed. page nos. Gotta get 3rd ed. soon :))

The sync behavior is further explained under the 'Coherency Model'
title at Page 68 | Chapter 3: The Hadoop Distributed Filesystem of the
same book. Think of sync() more as a checkpoint done over the write
pipeline, such that new readers can read the length of synced bytes
immediately and that they are guaranteed to be outside of the DN
application (JVM) buffers (i.e. flushed).

Some further notes, for general info: In 0.20.x/1.x releases, there's
no hard-guarantee that the write buffer flushing done via sync ensures
the data went to the *disk*. It may remain in the OS buffers (a
feature in OSes, for performance). This is cause we do not do an
fsync() (i.e. calling force on the FileChannel for the block and
metadata outputs), but rather just an output stream flush. In the
future, via 2.0.1-alpha release (soon to come at this point) and
onwards, the specific call hsync() will ensure that this is not the
case.

However, if you are OK with the OS buffers feature/caveat and
primarily need syncing not for reliability but for readers, you may
use the call hflush() and save on performance. One place where hsync()
is to be preferred instead of hflush() is where you use WALs (for data
reliability), and HBase is one such application. With hsync(), HBase
can survive potential failures caused by major power failure cases
(among others).

Let us know if this clears it up for you!

On Sat, Jun 9, 2012 at 4:58 AM, Mohit Anchlia <mohitanch...@gmail.com> wrote:
> I am wondering the role of sync in replication of data to other nodes. Say
> client writes a line to a file in Hadoop, at this point file handle is open
> and sync has not been called. In this scenario is data also replicated as
> defined by the replication factor to other nodes as well? I am wondering if
> at this point if crash occurs do I have data in other nodes?



-- 
Harsh J

Reply via email to