[ 
https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510654#comment-16510654
 ] 

Zach York commented on HBASE-20723:
-----------------------------------

[~yuzhih...@gmail.com] you're welcome to try that change, but as you can see 
from the log, it is already looking in the walDir. (rootdir == walDir here).

 

[~rpednekar] The WALSplitter is tasked with splitting logs (WALs). Why wouldn't 
it be looking in the hbase.wal.dir?

>From my understanding, the recovered edits should be in:
hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648/recovered.edits
However, that directory doesn't exist...

The one thing that one of my colleagues figured out recently is that edits 
aren't actually persisted to the WAL until they either reach a certain size or 
a time limit has elapsed that triggers the hsync() or hflush(). Since the VM 
didn't exit correctly, I'm assuming this is what happened. Can you try loading 
more data in (still under the flush size/interval), but enough to cause a hsync 
to the WAL file and see if you have the same issue?

 

[~stack] You mentioned you also ran into this issue... Can you provide any more 
info on your reproduction?

 

As [~apurtell] mentioned on the original JIRA, we tested this thoroughly when 
making the original change and have had many customers run with this setting 
without issue. It's possible that the patch was backported incorrectly to the 
Azure version, but it seems like this might be expected behavior when the 
number of writes are below the threshold required to sync/flush to the WAL file 
stream.

> WALSplitter uses the rootDir, which is walDir, as the tableDir root path.
> -------------------------------------------------------------------------
>
>                 Key: HBASE-20723
>                 URL: https://issues.apache.org/jira/browse/HBASE-20723
>             Project: HBase
>          Issue Type: Bug
>          Components: hbase
>    Affects Versions: 1.1.2
>            Reporter: Rohan Pednekar
>            Priority: Major
>         Attachments: logs.zip
>
>
> This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase 
> 1.1.2.2.6.3.2-14 
> By default the underlying data is going to wasb://xxxxx@yyyyy/hbase 
> I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at 
> /mnt.
> hbase.wal.dir= hdfs://mycluster/walontest
> hbase.wal.dir.perms=700
> hbase.rootdir.perms=700
> hbase.rootdir= 
> wasb://XYZ[@hbaseperf.core.net|mailto:duohbase5ds...@duohbaseperf.blob.core.windows.net]/hbase
> Procedure to reproduce this issue:
> 1. create a table in hbase shell
> 2. insert a row in hbase shell
> 3. reboot the VM which hosts that region
> 4. scan the table in hbase shell and it is empty
> Looking at the region server logs:
> {code}
> 2018-06-12 22:08:40,455 INFO  [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1] 
> wal.WALSplitter: This region's directory doesn't exist: 
> hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648. 
> It is very likely that it was already split so it's safe to discard those 
> edits.
> {code}
> The log split/replay ignored actual WAL due to WALSplitter is looking for the 
> region directory in the hbase.wal.dir we specified rather than the 
> hbase.rootdir.
> Looking at the source code,
>  
> [https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.20-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java#L519]
>  it uses the rootDir, which is walDir, as the tableDir root path.
> So if we use HBASE-17437, waldir and hbase rootdir are in different path or 
> even in different filesystem, then the #5 uses walDir as tableDir is 
> apparently wrong.
> CC: [~zyork], [~yuzhih...@gmail.com] Attached the logs for quick review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to