Hello,

I see that when HDFSBolt syncs it takes advantage of the fact that it has a 
direct handle to an HdfsDataOutputStream with the following code:


if (this.out instanceof HdfsDataOutputStream) {
    ((HdfsDataOutputStream) this.out).hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH));
} else {
    this.out.hsync();
}

SequenceFileBolt, however, has a higher level SequenceFile.Writer and so syncs 
like this:

this.writer.hsync();


>From looking at the implementation of hsync in DFSOutputStream (Hadoop 2.6.0) 
>it seems that without passing SyncFlag.UPDATE_LENGTH there is no guarantee 
>that namenode.fsync() gets called.

Was that flag added to HDFSBolt to ensure that fsync() is called every time?  
When I sync my SequenceFileBolt I don’t always see additional data written to 
the HDFS file, which I do see every single time with HDFSBolt syncs.

It seems that to get the same behavior, which is what I want, I have to close 
the SequenceFile and then reopen.  That seems like it will work, but at a 
performance cost.

I would appreciate any feedback on my analysis above or proposed solution.


Thanks!

Reply via email to