[ https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511621#comment-16511621 ]
Tak Lon (Stephen) Wu commented on HBASE-20723: ---------------------------------------------- for the [hflush in DFSOutputStream |https://github.com/apache/hadoop/blob/release-2.8.3-RC0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L525]using by WAL's ProtobufLogWriter AFA I understand is that it's writing blocks/packets to HDFS but not a complete WAL file, where those sent blocks/packets is a group of writes that have not been combined into a single file before WAL is being closed(). (let me know if I'm wrong) So, I found this problem when testing HBase on S3 with a 3-nodes cluster and setting WAL on HDFS, wrote a hbase-client to sequentially write N records (which key and value are both number #1 to #N), terminate the assigned region server by `kill -9 $pid` and restart it. those writing region(s) will be reassigned to another region server in few seconds, the client program completes w/o errors but when verifying the records, few records were missing. > WALSplitter uses the rootDir, which is walDir, as the tableDir root path. > ------------------------------------------------------------------------- > > Key: HBASE-20723 > URL: https://issues.apache.org/jira/browse/HBASE-20723 > Project: HBase > Issue Type: Bug > Components: hbase > Affects Versions: 1.1.2 > Reporter: Rohan Pednekar > Priority: Major > Attachments: logs.zip > > > This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase > 1.1.2.2.6.3.2-14 > By default the underlying data is going to wasb://xxxxx@yyyyy/hbase > I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at > /mnt. > hbase.wal.dir= hdfs://mycluster/walontest > hbase.wal.dir.perms=700 > hbase.rootdir.perms=700 > hbase.rootdir= > wasb://XYZ[@hbaseperf.core.net|mailto:duohbase5ds...@duohbaseperf.blob.core.windows.net]/hbase > Procedure to reproduce this issue: > 1. create a table in hbase shell > 2. insert a row in hbase shell > 3. reboot the VM which hosts that region > 4. scan the table in hbase shell and it is empty > Looking at the region server logs: > {code:java} > 2018-06-12 22:08:40,455 INFO [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1] > wal.WALSplitter: This region's directory doesn't exist: > hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648. > It is very likely that it was already split so it's safe to discard those > edits. > {code} > The log split/replay ignored actual WAL due to WALSplitter is looking for the > region directory in the hbase.wal.dir we specified rather than the > hbase.rootdir. > Looking at the source code, > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java > it uses the rootDir, which is walDir, as the tableDir root path. > So if we use HBASE-17437, waldir and hbase rootdir are in different path or > even in different filesystem, then the #5 uses walDir as tableDir is > apparently wrong. > CC: [~zyork], [~yuzhih...@gmail.com] Attached the logs for quick review. -- This message was sent by Atlassian JIRA (v7.6.3#76005)