I find that this answer was unsatisfying and could use some elaboration. I think its there because of java OutputStream convention and not so much because of hadoop.
The output object in ProtobufLogWriter is a HDFS FSDataOutputStream. The HFDS FSDataOutputStream essentially wraps a java OutputStream [1] (which has write byte[] and write int methods only) providing a Java DataOutputStream [2] object which provides nice writeXxxx methods for serializing primitive datatypes (int, float etc). For efficiency, usually you'd wrap the OutputStream with a BufferedOutputStream[3] which adds an in memory buffer and flushes to the underlaying outputstream when a certain size is reach or flush is called(). Since it gets it from the FS object I bet it could it could have different implementations other than just the DFSOutputStream you saw -- which require the flush. Jon. [1] http://docs.oracle.com/javase/7/docs/api/java/io/OutputStream.html [2] http://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html [3] http://docs.oracle.com/javase/7/docs/api/java/io/BufferedOutputStream.html On Thu, Nov 7, 2013 at 2:56 PM, Ted Yu <[email protected]> wrote: > Himanshu: > See > > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/DataOutputStream.java#DataOutputStream.flush%28%29 > The flush() call results in OutputStream.flush(). > > Cheers > > > On Mon, Nov 4, 2013 at 9:11 PM, Himanshu Vashishtha <[email protected] > >wrote: > > > Looking at ProtobufLogWriter class, it looks like the call to flush() in > > the sync method is a noop. > > > > > > > https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java#L134 > > > > The underlying output stream is DFSOutputStream, which doesn't implement > > flush(). > > > > And, it calls sync() anyway, which ensures the data is written to DN's > > (cache). > > > > Previously with SequenceFile$Writer, it writes data to the outputstream > > (using Writables#write), and invoke sync/hflush. > > > > > https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SequenceFile.java#L1314 > > > > Is there a reason we have this call here? Please let me know if I miss > any > > context. > > > > Thanks, > > Himanshu > > > -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [email protected]
