[jira] [Commented] (HBASE-4695) WAL logs get deleted before region server can fully flush

gaojinchao (Commented) (JIRA) Mon, 31 Oct 2011 00:55:00 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139986#comment-13139986
 ]


gaojinchao commented on HBASE-4695:
-----------------------------------

Latest Trunk version, test passed in a real cluster:

Region Server logs:
2011-10-31 03:32:42,922 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server 
C3S31,20020,1320034091400
2011-10-31 03:32:46,974 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server 
C3S31,20020,1320034091400; all regions closed.
2011-10-31 03:32:48,633 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: 
Moved 7 log files to /hbase/.oldlogs
2011-10-31 03:32:49,200 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server 
C3S31,20020,1320034091400; zookeeper connection closed.

Namenode logs:
2011-10-31 03:32:46,988 INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditEvent(192)) - ugi=root,root,sfcb    ip=/158.1.130.31 
       cmd=listStatus  src=/hbase/.logs/C3S31,20020,1320034091400      
perm=root:supergroup:rwxr-xr-x
2011-10-31 03:32:46,991 INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditEvent(177)) - ugi=root,root,sfcb    ip=/158.1.130.31 
       cmd=rename      
src=/hbase/.logs/C3S31,20020,1320034091400/C3S31%2C20020%2C1320034091400.1320045179340
  dst=/hbase/.oldlogs/C3S31%2C20020%2C1320034091400.1320045179340 
perm=root:supergroup:rw-r--r--
2011-10-31 03:32:46,992 INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditEvent(177)) - ugi=root,root,sfcb    ip=/158.1.130.31 
       cmd=rename      
src=/hbase/.logs/C3S31,20020,1320034091400/C3S31%2C20020%2C1320034091400.1320046155808
  dst=/hbase/.oldlogs/C3S31%2C20020%2C1320034091400.1320046155808 
perm=root:supergroup:rw-r--r--
2011-10-31 03:32:46,994 INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditEvent(177)) - ugi=root,root,sfcb    ip=/158.1.130.31 
       cmd=rename      
src=/hbase/.logs/C3S31,20020,1320034091400/C3S31%2C20020%2C1320034091400.1320046186294
  dst=/hbase/.oldlogs/C3S31%2C20020%2C1320034091400.1320046186294 
perm=root:supergroup:rw-r--r--
2011-10-31 03:32:46,996 INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditEvent(177)) - ugi=root,root,sfcb    ip=/158.1.130.31 
       cmd=rename      
src=/hbase/.logs/C3S31,20020,1320034091400/C3S31%2C20020%2C1320034091400.1320046216288
  dst=/hbase/.oldlogs/C3S31%2C20020%2C1320034091400.1320046216288 
perm=root:supergroup:rw-r--r--
2011-10-31 03:32:46,998 INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditEvent(177)) - ugi=root,root,sfcb    ip=/158.1.130.31 
       cmd=rename      
src=/hbase/.logs/C3S31,20020,1320034091400/C3S31%2C20020%2C1320034091400.1320046255166
  dst=/hbase/.oldlogs/C3S31%2C20020%2C1320034091400.1320046255166 
perm=root:supergroup:rw-r--r--
2011-10-31 03:32:47,206 INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditEvent(192)) - ugi=webuser,webgroup  ip=/158.1.130.33 
       cmd=listStatus  src=/hbase/.logs/C3S31,20020,1320034091400      
perm=root:supergroup:rwxr-xr-x
2011-10-31 03:32:48,518 INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditEvent(177)) - ugi=root,root,sfcb    ip=/158.1.130.31 
       cmd=rename      
src=/hbase/.logs/C3S31,20020,1320034091400/C3S31%2C20020%2C1320034091400.1320046295501
  dst=/hbase/.oldlogs/C3S31%2C20020%2C1320034091400.1320046295501 
perm=root:supergroup:rw-r--r--
2011-10-31 03:32:48,633 INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditEvent(177)) - ugi=root,root,sfcb    ip=/158.1.130.31 
       cmd=rename      
src=/hbase/.logs/C3S31,20020,1320034091400/C3S31%2C20020%2C1320034091400.1320046325013
  dst=/hbase/.oldlogs/C3S31%2C20020%2C1320034091400.1320046325013 
perm=root:supergroup:rw-r--r--
2011-10-31 03:32:48,650 INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditEvent(206)) - ugi=root,root,sfcb    ip=/158.1.130.31 
       cmd=delete      src=/hbase/.logs/C3S31,20020,1320034091400      
2011-10-31 03:32:49,389 INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditEvent(206)) - ugi=root,root,sfcb    ip=/158.1.130.32 
       cmd=delete      src=/hbase/.META./1028785192/.tmp       


                
> WAL logs get deleted before region server can fully flush
> ---------------------------------------------------------
>
>                 Key: HBASE-4695
>                 URL: https://issues.apache.org/jira/browse/HBASE-4695
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 0.90.4
>            Reporter: jack levin
>            Assignee: gaojinchao
>            Priority: Blocker
>             Fix For: 0.90.5
>
>         Attachments: HBASE-4695_branch90_trial.patch, hbase-4695-0.92.txt
>
>
> To replicate the problem do the following:
> 1. check /hbase/.logs/XXXX directory to see if you have WAL logs for the 
> region server you are shutting down.
> 2. executing kill <pid> (where pid is a regionserver pid)
> 3. Watch the regionserver log to start flushing, you will see how many 
> regions are left to flush:
> 09:36:54,665 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
> on 489 regions to close
> 09:56:35,779 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
> on 116 regions to close
> 4. Check /hbase/.logs/XXXX -- you will notice that it has dissapeared.
> 5. Check namenode logs:
> 09:26:41,607 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: 
> ugi=root ip=/10.101.1.5 cmd=delete 
> src=/hbase/.logs/rdaa5.prod.imageshack.com,60020,1319749
> Note that, if you kill -9 the RS now, and it crashes on flush, you won't have 
> any WAL logs to replay.  We need to make sure that logs are deleted or moved 
> out only when RS has fully flushed. Otherwise its possible to lose data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4695) WAL logs get deleted before region server can fully flush

Reply via email to