Hi Stack,

Comments inline below

Lars

On Fri, Jan 29, 2010 at 9:30 PM, Stack <st...@duboce.net> wrote:
> On Fri, Jan 29, 2010 at 5:17 AM, Lars George <lars.geo...@gmail.com> wrote:
>> �...@override
>>  public void sync() throws IOException {
>>    this.writer.sync();
>>    if (this.writer_out != null) {
>>      this.writer_out.sync();
>>    }
>>  }
>>
>> The first sync calls SequenceFile.Writer.sync() which is not at all
>> what we want, i.e. the hflush or flush/sync in general but setting a
>> sync marker in the file. What purpose does that have and why is that
>> needed here?
>
> Yes. SF.sync does not what you'd expect a sync to do, as you point out
> later in your mail.
>
> We continue to call it in TRUNK because its what we've always done and
> my thinking was, maybe the marker written to SF file will be of use.
> IIUC the marker is used to pick up the parse after we've crossed the
> corrupted section.

Makes sense. Also makes sense to think about a better suited file
format anyways but that is already underway, so this is more for
understanding the legacy code.

>> The second call is the deprecated FSDataOutputStream.sync() which does
>> the actual hflush() internally. Just wondering what the sync is for
>> really.
>
> Yeah, the call to FSDOS.sync is the important one.  Its deprecated but
> its doing the right thing (At the time of writing this was all that
> was available).

Understood.

>>
>> Looking at the old 0.20 call
>>
>>  public void sync() throws IOException {
>>    lastLogFlushTime = System.currentTimeMillis();
>>    if (this.append && syncfs != null) {
>>      try {
>>        this.syncfs.invoke(this.writer, NO_ARGS);
>>      } catch (Exception e) {
>>        throw new IOException("Reflection", e);
>>      }
>>    } else {
>>      this.writer.sync();
>>    }
>>    this.unflushedEntries.set(0);
>>    syncTime += System.currentTimeMillis() - lastLogFlushTime;
>>    syncOps++;
>>  }
>>
>> That is calling thew actual syncfs aka hflush
>
> No.  syncfs is hdfs-200 thing.   hflush is hdfs-265.  In 0.20, we had
> it rigged so that if hdfs-200 was present, we'd notice it and call
> syncFs on any sync invocation.
>

Yes, understood as well, I meant it called as you say syncfs, now we
call hflush.

>> Why are we calling it? It is called internally anyways after every few
>> hundred bytes it says here
>>
>>    synchronized void checkAndWriteSync() throws IOException {
>>      if (sync != null &&
>>          out.getPos() >= lastSyncPos+SYNC_INTERVAL) { // time to emit sync
>>        sync();
>>      }
>>    }
>>
>> Just when a KV is say 10KB then it is basically in between nearly
>> every KV (unless the HLogKey + KeyValue is minute).
>>
>
> I suppose it should come as no surprise that if actual SF sync is
> managed internally, it won't match KV boundaries.   I think it may
> prove of use still though Lars letting the SF parse pick up again
> after its crossed a corrupt patch.  I haven't checked the code though
> and maybe we want to fail if SF has any corruption in it?

Yes, as per the above. Makes sense as long as we use SequenceFile.

> Thanks for digging in on this Lars.
> St.Ack

Still digging :)

So with making Writer implement Syncable, does that make sense? Since
that is what we really want, have HLog call the same functions on
Writer for hsync() and hflush(). Right? That would enforce the right
interface on implementing Writer classes. Just wondering.

Reply via email to