On Thu, Nov 11, 2010 at 11:55 AM, Hairong Kuang <kuang.hair...@gmail.com>wrote:
> A few clarification on API2 semantics. > > 1. Ack gets sent back to the client before a packet gets written to local > files. > Ah, I see in trunk this is the case. In 0.20-append, it's the other way around - we only enqueue after flush. > 2. Data become visible to new readers on the condition that at least one > DataNode does not have an error. > 3. The reason that flush is done after a write is more for the purpose of > implementation simplification. Currently readers do not read from DataNode > buffer. They only read from system buffer. A flush makes the data visible > to > readers sooner. > > Hairong > > On 11/11/10 7:31 AM, "Thanh Do" <than...@cs.wisc.edu> wrote: > > > Thanks Todd, > > > > In HDFS-6313, i see three API (sync, hflush, hsync), > > And I assume hflush corresponds to : > > > > *"API2: flushes out to all replicas of the block. > > The data is in the buffers of the DNs but not on the DN's OS buffers. > > New readers will see the data after the call has returned.*" > > > > I am still confused that, once the client calls hflushes, > > the client gonna wait for all outstanding packet to be acked, > > before sending subsequent packet. > > But at DataNode, it is possible that the ack to client is sent > > before the write of data and checksum to replica. So if > > the DataNode crashes just after sending the ack and before > > writing to replica, will the semantics be violated here? > > > > Thanks > > Thanh > > > > On Wed, Nov 10, 2010 at 11:11 PM, Todd Lipcon <t...@cloudera.com> wrote: > > > >> Nope, flush just flushes the java side buffer to the Linux buffer > >> cache -- not all the way to the media. > >> > >> Hsync is the API that will eventually go all the way to disk, but it > >> has not yet been implemented. > >> > >> -Todd > >> > >> On Wednesday, November 10, 2010, Thanh Do <than...@cs.wisc.edu> wrote: > >>> Or another way to rephase my question: > >>> does data.flush and checksumOut.flush guarantee > >>> data be synchronized with underlying disk, > >>> just like fsync(). > >>> > >>> Thanks > >>> Thanh > >>> > >>> On Wed, Nov 10, 2010 at 10:26 PM, Thanh Do <than...@cs.wisc.edu> > wrote: > >>> > >>>> Hi all, > >>>> > >>>> After reading the appenddesign3.pdf in HDFS-256, > >>>> and looking at the BlockReceiver.java code in 0.21.0, > >>>> I am confused by the following. > >>>> > >>>> The document says that: > >>>> *For each packet, a DataNode in the pipeline has to do 3 things. > >>>> 1. Stream data > >>>> a. Receive data from the upstream DataNode or the client > >>>> b. Push the data to the downstream DataNode if there is any > >>>> 2. Write the data/crc to its block file/meta file. > >>>> 3. Stream ack > >>>> a. Receive an ack from the downstream DataNode if there is any > >>>> b. Send an ack to the upstream DataNode or the client* > >>>> > >>>> And *"...there is no guarantee on the order of (2) and (3)"* > >>>> > >>>> In BlockReceiver.receivePacket(), after read the packet buffer, > >>>> datanode does: > >>>> 1) put the packet seqno in the ack queue > >>>> 2) write data and checksum to disk > >>>> 3) flush data and checksum (to disk) > >>>> > >>>> The thing that confusing me is that: the streaming of ack does not > >>>> necessary depends on whether data has been flush to disk or not. > >>>> Then, my question is: > >>>> Why do DataNode need to flush data and checksum > >>>> every time the DataNode receives a packet. This flush may be costly. > >>>> Why cant the DataNode just batch server write (after receiving > >>>> server packet) and flush all at once? > >>>> Is there any particular reason for doing so? > >>>> > >>>> Can somebody clarify this for me? > >>>> > >>>> Thanks so much. > >>>> Thanh > >>>> > >>>> > >>>> > >>>> > >>> > >> > >> -- > >> Todd Lipcon > >> Software Engineer, Cloudera > >> > > > -- Todd Lipcon Software Engineer, Cloudera