HBase 0.20 had a hack that would recognize the presence of Dhruba's
HDFS-200.  If it had been applied, then we'd do the open-for-append,
close, and reopen to recover edits written to an unclosed WAL/HLog
file (Grep 'syncfs' in HLog on the 0.20 branch).

In HBase TRUNK, the above hackery was stripped out.  In TRUNK we are
leaning on the new hflush/HDFS-265 rather than HDFS-200.  For hflush,
when we do FSDataInputStream::available(), its returning the 'right'
answer (WALReaderFsDataInputStream::getPos() was added before an API
was available.  HBASE-2069 is about using the new API instead of this
getPos fancy-dancing).

It sounds like you need to do a bit of merging of TRUNK group commit
and the old hbase code that exploited HDFS-200?

St.Ack

On Tue, Jan 26, 2010 at 12:35 PM, Nicolas Spiegelberg
<nspiegelb...@facebook.com> wrote:
> Hi,
>
> I am trying to backport the HLog group commit functionality to Hbase 0.20.  
> For proper reliability, I am working with Dhruba to get the 0.21 syncFs() 
> changes from HDFS ported back to HDFS 0.20 as well.  When going through a 
> peer review of the modified code, my group had a question about the 
> SequenceFileLogReader.java (WALReader).  I am hoping that you guys could be 
> of assistance.
>
> I know that there is an open issue [HBASE-2069] where Hlog::splitLog() does 
> not call DFSDataInputStream::getVisibleLength(), which would properly sync 
> hflushed, but unclosed, file lengths.  I believe the current workaround is to 
> open an HDFS file in append mode & then close, which would cause the namenode 
> to get updates from the datanodes.  However, I don’t see that shim present in 
> Hlog::splitLog() on the 0.21 trunk.  Is this a pending issue to fix or is 
> calling FSDataInputStream::available() within 
> WALReaderFsDataInputStream::getPos() sufficient to force the namenode to sync 
> up with the datanodes?
>
> Nicolas Spiegelberg
>

Reply via email to