[
https://issues.apache.org/jira/browse/HBASE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988189#comment-14988189
]
Samir Ahmic commented on HBASE-14223:
-------------------------------------
Thanks for clarification [~enis].
I did not explain well this part when splitlog task is put to znode. Here is
more precise difference between cases when filter is present and without filter:
With filter:
{code}
2015-11-03 10:14:34,080 DEBUG [main-EventThread]
coordination.SplitLogManagerCoordination: put up splitlog task at znode
/hbase/splitWAL/WALs%2Fhnode2%2C16000%2C1446541683349-splitting%2Fhnode2%252C16000%252C1446541683349.1446541986629
{code}
Without filter:
{code}
2015-11-03 10:18:43,747 DEBUG [main-EventThread]
coordination.SplitLogManagerCoordination: put up splitlog task at znode
/hbase/splitWAL/WALs%2Fhnode2%2C16000%2C1446541683349-splitting%2Fhnode2%252C16000%252C1446541683349.meta.1446541987763.meta
{code}
What i meant with deformed file is why in case with filter .meta suffix is
removed and in case without filter same suffix is present. Regarding that i see
on master branch we have replaced SSH with ServerCrashProcedure which process
failed servers and in logs i see this lines:
{code}
2015-11-03 10:18:43,350 DEBUG [hnode1:16000.activeMasterManager]
procedure2.ProcedureExecutor: Procedure ServerCrashProcedure
serverName=hnode2,16000,1446541683349, shouldSplitWal=true, carryingMeta=false
id=855 state=RUNNABLE:SERVER_CRASH_START added to the store.
2015-11-03 10:18:43,618 DEBUG [ProcedureExecutor-0]
procedure.ServerCrashProcedure: Splitting logs from hnode2,16000,1446541683349;
region count=0
{code]
Which indicates that server was processed as regionserver not carrying
hbase:meta table and in my testing case hnode2 was active master which means it
was carrying meta table for sure.
Could this be also part of problem ?
> Meta WALs are not cleared if meta region was closed and RS aborts
> -----------------------------------------------------------------
>
> Key: HBASE-14223
> URL: https://issues.apache.org/jira/browse/HBASE-14223
> Project: HBase
> Issue Type: Bug
> Reporter: Enis Soztutar
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.4
>
> Attachments: HBASE-14223logs, hbase-14223_v0.patch
>
>
> When an RS opens meta, and later closes it, the WAL(FSHlog) is not closed.
> The last WAL file just sits there in the RS WAL directory. If RS stops
> gracefully, the WAL file for meta is deleted. Otherwise if RS aborts, WAL for
> meta is not cleaned. It is also not split (which is correct) since master
> determines that the RS no longer hosts meta at the time of RS abort.
> From a cluster after running ITBLL with CM, I see a lot of {{-splitting}}
> directories left uncleaned:
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls
> /apps/hbase/data/WALs
> Found 31 items
> drwxr-xr-x - hbase hadoop 0 2015-06-05 01:14
> /apps/hbase/data/WALs/hregion-58203265
> drwxr-xr-x - hbase hadoop 0 2015-06-05 07:54
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433489308745-splitting
> drwxr-xr-x - hbase hadoop 0 2015-06-05 09:28
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433494382959-splitting
> drwxr-xr-x - hbase hadoop 0 2015-06-05 10:01
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433498252205-splitting
> ...
> {code}
> The directories contain WALs from meta:
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting
> Found 2 items
> -rw-r--r-- 3 hbase hadoop 201608 2015-06-05 03:15
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
> -rw-r--r-- 3 hbase hadoop 44420 2015-06-05 04:36
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> The RS hosted the meta region for some time:
> {code}
> 2015-06-05 03:14:28,692 INFO [PostOpenDeployTasks:1588230740]
> zookeeper.MetaTableLocator: Setting hbase:meta region location in ZooKeeper
> as os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285
> ...
> 2015-06-05 03:15:17,302 INFO
> [RS_CLOSE_META-os-enis-dal-test-jun-4-5:16020-0] regionserver.HRegion: Closed
> hbase:meta,,1.1588230740
> {code}
> In between, a WAL is created:
> {code}
> 2015-06-05 03:15:11,707 INFO
> [RS_OPEN_META-os-enis-dal-test-jun-4-5:16020-0-MetaLogRoller] wal.FSHLog:
> Rolled WAL
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
> with entries=385, filesize=196.88 KB; new WAL
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> When CM killed the region server later master did not see these WAL files:
> {code}
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:46,075
> INFO [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0]
> master.SplitLogManager: started splitting 2 logs in
> [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting]
> for [os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285]
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:47,300
> INFO [main-EventThread] wal.WALSplitter: Archived processed log
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436
> to
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/oldWALs/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,497
> INFO [main-EventThread] wal.WALSplitter: Archived processed log
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475175329
> to
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/oldWALs/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475175329
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,507
> WARN [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0]
> master.SplitLogManager: returning success without actually splitting and
> deleting all the log files in path
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,508
> INFO [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0]
> master.SplitLogManager: finished splitting (more than or equal to) 129135000
> bytes in 2 log files in
> [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting]
> in 4433ms
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)