Hi JD, I have more question in regards to HLog and more concretely the underlaying SequenceFileLogWriter. This is what is called from HLog (assuming that is current, you said you changed things to hflush() in the replication context?):
@Override public void sync() throws IOException { this.writer.sync(); if (this.writer_out != null) { this.writer_out.sync(); } } The first sync calls SequenceFile.Writer.sync() which is not at all what we want, i.e. the hflush or flush/sync in general but setting a sync marker in the file. What purpose does that have and why is that needed here? The second call is the deprecated FSDataOutputStream.sync() which does the actual hflush() internally. Just wondering what the sync is for really. Looking at the old 0.20 call public void sync() throws IOException { lastLogFlushTime = System.currentTimeMillis(); if (this.append && syncfs != null) { try { this.syncfs.invoke(this.writer, NO_ARGS); } catch (Exception e) { throw new IOException("Reflection", e); } } else { this.writer.sync(); } this.unflushedEntries.set(0); syncTime += System.currentTimeMillis() - lastLogFlushTime; syncOps++; } That is calling thew actual syncfs aka hflush if available otherwise also the above sync() on SequenceFile.Writer, which did and does this (as noted above): /** create a sync point */ public void sync() throws IOException { if (sync != null && lastSyncPos != out.getPos()) { out.writeInt(SYNC_ESCAPE); // mark the start of the sync out.write(sync); // write sync lastSyncPos = out.getPos(); // update lastSyncPos } } Well, what that does I can see in the SequenceFile.Reader.readRecordLength(). But that certainly does no OS sync (or fsync or syncfs) nor hflush nor even a stream flush. Why are we calling it? It is called internally anyways after every few hundred bytes it says here synchronized void checkAndWriteSync() throws IOException { if (sync != null && out.getPos() >= lastSyncPos+SYNC_INTERVAL) { // time to emit sync sync(); } } Just when a KV is say 10KB then it is basically in between nearly every KV (unless the HLogKey + KeyValue is minute). So overall I think we should *not* call writer.sync() unless it is Syncable and then call hflush() instead? Lars