HBase 0.20 had a hack that would recognize the presence of Dhruba's HDFS-200. If it had been applied, then we'd do the open-for-append, close, and reopen to recover edits written to an unclosed WAL/HLog file (Grep 'syncfs' in HLog on the 0.20 branch).
In HBase TRUNK, the above hackery was stripped out. In TRUNK we are leaning on the new hflush/HDFS-265 rather than HDFS-200. For hflush, when we do FSDataInputStream::available(), its returning the 'right' answer (WALReaderFsDataInputStream::getPos() was added before an API was available. HBASE-2069 is about using the new API instead of this getPos fancy-dancing). It sounds like you need to do a bit of merging of TRUNK group commit and the old hbase code that exploited HDFS-200? St.Ack On Tue, Jan 26, 2010 at 12:35 PM, Nicolas Spiegelberg <nspiegelb...@facebook.com> wrote: > Hi, > > I am trying to backport the HLog group commit functionality to Hbase 0.20. > For proper reliability, I am working with Dhruba to get the 0.21 syncFs() > changes from HDFS ported back to HDFS 0.20 as well. When going through a > peer review of the modified code, my group had a question about the > SequenceFileLogReader.java (WALReader). I am hoping that you guys could be > of assistance. > > I know that there is an open issue [HBASE-2069] where Hlog::splitLog() does > not call DFSDataInputStream::getVisibleLength(), which would properly sync > hflushed, but unclosed, file lengths. I believe the current workaround is to > open an HDFS file in append mode & then close, which would cause the namenode > to get updates from the datanodes. However, I don’t see that shim present in > Hlog::splitLog() on the 0.21 trunk. Is this a pending issue to fix or is > calling FSDataInputStream::available() within > WALReaderFsDataInputStream::getPos() sufficient to force the namenode to sync > up with the datanodes? > > Nicolas Spiegelberg >