[
https://issues.apache.org/jira/browse/HBASE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987872#comment-14987872
]
Enis Soztutar commented on HBASE-14223:
---------------------------------------
Thanks for following up.
bq. So i have removed filter parameter from SplitLogManager#getFileList()
The filter is needed since we are doing WAL splitting for meta and the regular
WALs differently. Removing the filter has the effect of splitting (and hance
recovering) the meta WALs that were not needed to be recovered. This may end up
bringing back old data that is already deleted from meta, so it is not
something safe.
bq. I'm not sure why file name was deformed (i suspect on PathFilter filter).
The znodes have the hdfs path encoded in them. The split task is coordinated
via znodes, so that file name to split is turned into a znode path.
> Meta WALs are not cleared if meta region was closed and RS aborts
> -----------------------------------------------------------------
>
> Key: HBASE-14223
> URL: https://issues.apache.org/jira/browse/HBASE-14223
> Project: HBase
> Issue Type: Bug
> Reporter: Enis Soztutar
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.4
>
> Attachments: HBASE-14223logs, hbase-14223_v0.patch
>
>
> When an RS opens meta, and later closes it, the WAL(FSHlog) is not closed.
> The last WAL file just sits there in the RS WAL directory. If RS stops
> gracefully, the WAL file for meta is deleted. Otherwise if RS aborts, WAL for
> meta is not cleaned. It is also not split (which is correct) since master
> determines that the RS no longer hosts meta at the time of RS abort.
> From a cluster after running ITBLL with CM, I see a lot of {{-splitting}}
> directories left uncleaned:
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls
> /apps/hbase/data/WALs
> Found 31 items
> drwxr-xr-x - hbase hadoop 0 2015-06-05 01:14
> /apps/hbase/data/WALs/hregion-58203265
> drwxr-xr-x - hbase hadoop 0 2015-06-05 07:54
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433489308745-splitting
> drwxr-xr-x - hbase hadoop 0 2015-06-05 09:28
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433494382959-splitting
> drwxr-xr-x - hbase hadoop 0 2015-06-05 10:01
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433498252205-splitting
> ...
> {code}
> The directories contain WALs from meta:
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting
> Found 2 items
> -rw-r--r-- 3 hbase hadoop 201608 2015-06-05 03:15
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
> -rw-r--r-- 3 hbase hadoop 44420 2015-06-05 04:36
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> The RS hosted the meta region for some time:
> {code}
> 2015-06-05 03:14:28,692 INFO [PostOpenDeployTasks:1588230740]
> zookeeper.MetaTableLocator: Setting hbase:meta region location in ZooKeeper
> as os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285
> ...
> 2015-06-05 03:15:17,302 INFO
> [RS_CLOSE_META-os-enis-dal-test-jun-4-5:16020-0] regionserver.HRegion: Closed
> hbase:meta,,1.1588230740
> {code}
> In between, a WAL is created:
> {code}
> 2015-06-05 03:15:11,707 INFO
> [RS_OPEN_META-os-enis-dal-test-jun-4-5:16020-0-MetaLogRoller] wal.FSHLog:
> Rolled WAL
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
> with entries=385, filesize=196.88 KB; new WAL
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> When CM killed the region server later master did not see these WAL files:
> {code}
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:46,075
> INFO [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0]
> master.SplitLogManager: started splitting 2 logs in
> [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting]
> for [os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285]
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:47,300
> INFO [main-EventThread] wal.WALSplitter: Archived processed log
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436
> to
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/oldWALs/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,497
> INFO [main-EventThread] wal.WALSplitter: Archived processed log
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475175329
> to
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/oldWALs/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475175329
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,507
> WARN [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0]
> master.SplitLogManager: returning success without actually splitting and
> deleting all the log files in path
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,508
> INFO [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0]
> master.SplitLogManager: finished splitting (more than or equal to) 129135000
> bytes in 2 log files in
> [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting]
> in 4433ms
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)