Hi,was looking for some hbase recovery help,
A region split failed on my cluster (may be my first region split on this cluster) and now getRow is failing for some rows, there are mostly two exceptions in the logs, "region is closed" and "hbase needs EXECUTE permissions on /home/hbase/.../129547833999.34232/data' (the data dir does not exist, the dir above is an empty file). Running hbase 0.2.1, hadoop 0.17.1. Restarting did not fix the region, just get the same exceptions on a different region server. I also investigated with some hbase shell commands: Describe succeeds (perhaps indicates root or meta is otay?): list describe 'profile' Reenable did not fix it: disable 'profile' enable 'profile' Count still fails on the offline regions, this is a good simple check: count 'profile' ... exception ... region closed ... 10 times ... The initial exceptions in hbase logs are about a split fail on permissions, in hadoop logs the initial exceptions is permissions on the parent dir of a data dir getFileInfo(/home/hbase/profile/751151815/target/mapfiles/32735381133438 65674.1721228298/data) from 10.200.140.72:40236: error: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=hbase, access=EXECUTE, inode="3273538113343865674.1721228298":hbase:hbase:rw-r--r-- But that parent dir is actually an empty file (and hdfs files seem to never have execute permissions): $ bin/hadoop fs -lsr /home/hbase/profile/751151815/target/mapfiles/3273538113343865674.172122 8298 /home/hbase/profile/751151815/target/mapfiles/3273538113343865674.172122 8298 <r 3> 0 2009-01-12 21:42 rw-rw-r-- hbase hbase Here are the initial exceptions that are followed by many region closed exceptions on getRow: Comment #10 <http://bugzilla.corp.veoh/show_bug.cgi?id=20004#c10> From Leor Thomas <mailto:[email protected]> 2009-01-14 06:55:23 [reply <http://bugzilla.corp.veoh/process_bug.cgi#add_comment> ] ------- Was looking at the logs... It seems like hbase got a file permissions exception on split/compact of a region at ~7pm Jan 13. At ~7pm Jan 13 hbase was restarted and it there are similar permission exceptions. The permissions might be about HDFS file permissions, also possible they could be about local file permissions on an hdfs node. I found an exception in a hadoop log that matches the permission exception on split compaction in the hbase region server log: http://xadoop1:50070/logs/hadoop-hadoop-namenode-xadoop1.els1.log.2009-0 1-13 2009-01-13 19:15:11,654 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 50020, call getFileInfo(/home/hbase/profile/2146100670/target/mapfiles/7828971234340 639717.1721228298/data) from 10.200.140.72:45050: error: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=hbase, access=EXECUTE, inode="7828971234340639717.1721228298":hbase:hbase:rw-r--r-- org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=hbase, access=EXECUTE, inode="7828971234340639717.1721228298":hbase:hbase:rw-r--r-- at org.apache.hadoop.dfs.PermissionChecker.check(PermissionChecker.java:175 ) at org.apache.hadoop.dfs.PermissionChecker.checkTraverse(PermissionChecker. java:131) at org.apache.hadoop.dfs.PermissionChecker.checkPermission(PermissionChecke r.java:101) at org.apache.hadoop.dfs.FSNamesystem.checkPermission(FSNamesystem.java:421 4) at org.apache.hadoop.dfs.FSNamesystem.checkTraverse(FSNamesystem.java:4189) at org.apache.hadoop.dfs.FSNamesystem.getFileInfo(FSNamesystem.java:1562) at org.apache.hadoop.dfs.NameNode.getFileInfo(NameNode.java:440) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) 2009-01-13 19:15:13,386 INFO org.apache.hadoop.dfs.StateChange: BLOCK* ask 10.200.140.72:50010 to delete blk_-2803854957896101600 blk_-1365392011282362536 blk_-9016864291042902362 ------- Comment #11 <http://bugzilla.corp.veoh/show_bug.cgi?id=20004#c11> From Leor Thomas <mailto:[email protected]> 2009-01-14 07:24:15 [reply <http://bugzilla.corp.veoh/process_bug.cgi#add_comment> ] ------- Found this similar error in the hadoop log from the 12th. There are also many pipeline file exceptions in the log, they maybe ok (ask jo). http://xadoop1:50070/logs/hadoop-hadoop-namenode-xadoop1.els1.log.2009-0 1-12 2009-01-12 21:42:26,255 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.200.140.72:50010 is added to blk_-3434160565482012683 size 67 2009-01-12 21:42:26,336 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 50020, call getFileInfo(/home/hbase/profile/751151815/target/mapfiles/32735381133438 65674.1721228298/data) from 10.200.140.72:40236: error: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=hbase, access=EXECUTE, inode="3273538113343865674.1721228298":hbase:hbase:rw-r--r-- org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=hbase, access=EXECUTE, inode="3273538113343865674.1721228298":hbase:hbase:rw-r--r-- at org.apache.hadoop.dfs.PermissionChecker.check(PermissionChecker.java:175 ) at org.apache.hadoop.dfs.PermissionChecker.checkTraverse(PermissionChecker. java:131) at org.apache.hadoop.dfs.PermissionChecker.checkPermission(PermissionChecke r.java:101) at org.apache.hadoop.dfs.FSNamesystem.checkPermission(FSNamesystem.java:421 4) at org.apache.hadoop.dfs.FSNamesystem.checkTraverse(FSNamesystem.java:4189) at org.apache.hadoop.dfs.FSNamesystem.getFileInfo(FSNamesystem.java:1562) at org.apache.hadoop.dfs.NameNode.getFileInfo(NameNode.java:440) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) 2009-01-12 21:42:26,545 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /home/pipeline/2009/01/12/20/parser/targeting/FileParser.bad. blk_7288671402489388204 2009-01-12 21:42:27,349 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.200.140.57:50010 is added to blk_7288671402489388204 size 151904 ------- Comment #12 <http://bugzilla.corp.veoh/show_bug.cgi?id=20004#c12> From Leor Thomas <mailto:[email protected]> 2009-01-14 07:37:39 [reply <http://bugzilla.corp.veoh/process_bug.cgi#add_comment> ] ------- This seems like the original HBase error, ~9:30pm Jan 12 http://hadoop28.els1.veoh.com:60030/logs/hbase-hbase-regionserver-hadoop 28.els1.log.2009-01-12 2009-01-12 21:40:58,345 INFO org.apache.hadoop.hbase.regionserver.HLog: New log writer created at /home/hbase/log_10.200.140.72_1225758739209_60020/hlog.dat.1231825258323 2009-01-12 21:40:58,345 INFO org.apache.hadoop.hbase.regionserver.HLog: removing old log file /home/hbase/log_10.200.140.72_1225758739209_60020/hlog.dat.1231819870115 whose highest sequence/edit id is 3781258 2009-01-12 21:42:21,737 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region profile,,1228188068084 2009-01-12 21:42:24,960 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region profile,,1228188068084 in 3sec 2009-01-12 21:42:24,960 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting split of region profile,,1228188068084 2009-01-12 21:42:24,985 INFO org.apache.hadoop.hbase.regionserver.HRegion: closed profile,,1228188068084 2009-01-12 21:42:26,351 ERROR org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split failed for region profile,,1228188068084 org.apache.hadoop.fs.permission.AccessControlException: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=hbase, access=EXECUTE, inode="3273538113343865674.1721228298":hbase:hbase:rw-r--r-- at org.apache.hadoop.dfs.PermissionChecker.check(PermissionChecker.java:175 ) ... Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=hbase, access=EXECUTE, inode="3273538113343865674.1721228298":hbase:hbase:rw-r--r-- at org.apache.hadoop.dfs.PermissionChecker.check(PermissionChecker.java:175 ) ... ------- Comment #13 <http://bugzilla.corp.veoh/show_bug.cgi?id=20004#c13> From Leor Thomas <mailto:[email protected]> 2009-01-14 07:43:10 [reply <http://bugzilla.corp.veoh/process_bug.cgi#add_comment> ] ------- A warn a little higher up in that same hbase log: 2009-01-12 15:11:38,252 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region profile,,1228188068084 2009-01-12 15:11:57,078 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call getRow([...@546997a2, [...@606237f9, null, 9223372036854775807) from 10.200.140.57:44068: output error 2009-01-12 15:11:57,078 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: doAsyncWrite threw exception java.io.IOException: Connection reset by peer 2009-01-12 15:12:10,135 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region profile,,1228188068084 in 31sec The data subdir in the exceptions exists in some mapfiles subdirs, but there are also empty files similarly named in mapfiles, and those do not have execute permissions and do not have data folders in them. Here is an lsr of target dirs: had...@lthomas-lt /cygdrive/c/leor/bin/hadoop-0.17.1 $ bin/hadoop fs -lsr /home/hbase/profile | grep target /home/hbase/profile/1124966101/target <dir> 2009-01-13 22:24 rwxr-xr-x hbase hbase /home/hbase/profile/1124966101/target/info <dir> 2009-01-13 22:24 rwxr-xr-x hbase hbase /home/hbase/profile/1124966101/target/info/1480298320017890631.172122829 8 <r 3> 67 2009-01-13 22:24 rw-r--r-- hbase hbase /home/hbase/profile/1124966101/target/mapfiles <dir> 2009-01-13 22:24 rwxr-xr-x hbase hbase /home/hbase/profile/1124966101/target/mapfiles/1480298320017890631.17212 28298 <r 3> 0 2009-01-13 22:24 rw-r--r-- hbase hbase /home/hbase/profile/1721228298/splits/1354500955/target <dir> 2009-01-13 22:24 rwxr-xr-x hbase hbase /home/hbase/profile/1721228298/splits/1354500955/target/info <dir> 2009-01-13 22:24 rwxr-xr-x hbase hbase /home/hbase/profile/1721228298/splits/1354500955/target/info/53085121702 63958294.1721228298 <r 3> 67 2009-01-13 22:24 rw-r--r-- hbase /home/hbase/profile/1721228298/splits/1354500955/target/mapfiles <dir> 2009-01-13 22:24 rwxr-xr-x hbase hbase /home/hbase/profile/1721228298/splits/1354500955/target/mapfiles/5308512 170263958294.1721228298 <r 3> 0 2009-01-13 22:24 rw-r--r-- hbase /home/hbase/profile/1721228298/target <dir> 2008-12-01 19:21 rwxr-xr-x hbase hbase /home/hbase/profile/1721228298/target/info <dir> 2009-01-13 19:14 rwxr-xr-x hbase hbase /home/hbase/profile/1721228298/target/info/6825406745082624757 <r 3> 9 2009-01-13 19:14 rw-r--r-- hbase hbase /home/hbase/profile/1721228298/target/mapfiles <dir> 2009-01-13 19:14 rwxr-xr-x hbase hbase /home/hbase/profile/1721228298/target/mapfiles/6825406745082624757 <dir> 2009-01-13 19:12 rwxr-xr-x hbase hbase /home/hbase/profile/1721228298/target/mapfiles/6825406745082624757/data <r 3> 268550949 2009-01-13 19:12 rw-r--r-- hbase hbase /home/hbase/profile/1721228298/target/mapfiles/6825406745082624757/index <r 3> 736519 2009-01-13 19:12 rw-r--r-- hbase hbase /home/hbase/profile/1721228298/target_archive <dir> 2008-12-01 19:21 rwxr-xr-x hbase hbase /home/hbase/profile/1721228298/target_archive/info <dir> 2008-12-01 19:21 rwxr-xr-x hbase hbase /home/hbase/profile/1721228298/target_archive/mapfiles <dir> 2008-12-01 19:21 rwxr-xr-x hbase hbase /home/hbase/profile/2146100670/target <dir> 2009-01-13 19:15 rwxr-xr-x hbase hbase /home/hbase/profile/2146100670/target/info <dir> 2009-01-13 19:15 rwxr-xr-x hbase hbase /home/hbase/profile/2146100670/target/info/7828971234340639717.172122829 8 <r 3> 67 2009-01-13 19:15 rw-r--r-- hbase hbase /home/hbase/profile/2146100670/target/mapfiles <dir> 2009-01-13 19:15 rwxr-xr-x hbase hbase /home/hbase/profile/2146100670/target/mapfiles/7828971234340639717.17212 28298 <r 3> 0 2009-01-13 19:15 rw-r--r-- hbase hbase /home/hbase/profile/751151815/target <dir> 2009-01-12 21:42 rwxrwxr-x hbase hbase /home/hbase/profile/751151815/target/info <dir> 2009-01-12 21:42 rwxrwxr-x hbase hbase /home/hbase/profile/751151815/target/info/3273538113343865674.1721228298 <r 3> 67 2009-01-12 21:42 rw-rw-rw- hbase hbase /home/hbase/profile/751151815/target/info/3430916177563875594.1721228298 <r 3> 67 2009-01-12 21:42 rw-rw-r-- hbase hbase /home/hbase/profile/751151815/target/info/3965813641963086919.1721228298 <r 3> 67 2009-01-12 21:42 rw-rw-r-- hbase hbase /home/hbase/profile/751151815/target/mapfiles <dir> 2009-01-12 21:42 rwxrwxr-x hbase hbase /home/hbase/profile/751151815/target/mapfiles/3273538113343865674.172122 8298 <r 3> 0 2009-01-12 21:42 rw-rw-r-- hbase hbase /home/hbase/profile/751151815/target/mapfiles/3430916177563875594.172122 8298 <r 3> 0 2009-01-12 21:42 rw-rw-r-- hbase hbase /home/hbase/profile/751151815/target/mapfiles/3965813641963086919.172122 8298 <r 3> 0 2009-01-12 21:42 rw-rw-r-- hbase hbase had...@lthomas-lt /cygdrive/c/leor/bin/hadoop-0.17.1
