Hi,was looking for some hbase recovery help,

 

A region split failed on my cluster (may be my first region split on
this cluster) and now getRow is failing for some rows, there are mostly
two exceptions in the logs, "region is closed" and "hbase needs EXECUTE
permissions on /home/hbase/.../129547833999.34232/data' (the data dir
does not exist, the dir above is an empty file).  Running hbase 0.2.1,
hadoop 0.17.1.

 

 

 

Restarting did not fix the region, just get the same exceptions on a
different region server. 

 
I also investigated with some hbase shell commands:
 
Describe succeeds (perhaps indicates root or meta is otay?):
list
describe 'profile'
 
Reenable did not fix it:
disable 'profile'
enable 'profile'
 
Count still fails on the offline regions, this is a good simple check:
count 'profile'
... exception ... region closed ... 10 times ...

 

 

 

The initial exceptions in hbase logs are about a split fail on
permissions, 

in hadoop logs the initial exceptions is permissions on the parent dir
of a data dir

getFileInfo(/home/hbase/profile/751151815/target/mapfiles/32735381133438
65674.1721228298/data)
from 10.200.140.72:40236: error:
org.apache.hadoop.fs.permission.AccessControlException: Permission
denied:
user=hbase, access=EXECUTE,
inode="3273538113343865674.1721228298":hbase:hbase:rw-r--r--

 

But that parent dir is actually an empty file (and hdfs files seem to
never have execute permissions):

$ bin/hadoop fs -lsr
/home/hbase/profile/751151815/target/mapfiles/3273538113343865674.172122
8298

/home/hbase/profile/751151815/target/mapfiles/3273538113343865674.172122
8298    <r 3>   0       2009-01-12 21:42        rw-rw-r--       hbase
hbase

 

 

 

 

Here are the initial exceptions that are followed by many region closed
exceptions on getRow:

 

Comment #10 <http://bugzilla.corp.veoh/show_bug.cgi?id=20004#c10>  From
Leor Thomas <mailto:[email protected]>  2009-01-14 06:55:23 [reply
<http://bugzilla.corp.veoh/process_bug.cgi#add_comment> ] ------- 

Was looking at the logs...
It seems like hbase got a file permissions exception on split/compact of
a
region at ~7pm Jan 13.  At ~7pm Jan 13 hbase was restarted and it there
are
similar permission exceptions.  The permissions might be about HDFS file
permissions, also possible they could be about local file permissions on
an
hdfs node.
 
I found an exception in a hadoop log that matches the permission
exception on
split compaction in the hbase region server log:
http://xadoop1:50070/logs/hadoop-hadoop-namenode-xadoop1.els1.log.2009-0
1-13
 
 
2009-01-13 19:15:11,654 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 4
on 50020, call
getFileInfo(/home/hbase/profile/2146100670/target/mapfiles/7828971234340
639717.1721228298/data)
from 10.200.140.72:45050: error:
org.apache.hadoop.fs.permission.AccessControlException: Permission
denied:
user=hbase, access=EXECUTE,
inode="7828971234340639717.1721228298":hbase:hbase:rw-r--r--
org.apache.hadoop.fs.permission.AccessControlException: Permission
denied:
user=hbase, access=EXECUTE,
inode="7828971234340639717.1721228298":hbase:hbase:rw-r--r--
        at
org.apache.hadoop.dfs.PermissionChecker.check(PermissionChecker.java:175
)
        at
org.apache.hadoop.dfs.PermissionChecker.checkTraverse(PermissionChecker.
java:131)
        at
org.apache.hadoop.dfs.PermissionChecker.checkPermission(PermissionChecke
r.java:101)
        at
org.apache.hadoop.dfs.FSNamesystem.checkPermission(FSNamesystem.java:421
4)
        at
org.apache.hadoop.dfs.FSNamesystem.checkTraverse(FSNamesystem.java:4189)
        at
org.apache.hadoop.dfs.FSNamesystem.getFileInfo(FSNamesystem.java:1562)
        at org.apache.hadoop.dfs.NameNode.getFileInfo(NameNode.java:440)
        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
2009-01-13 19:15:13,386 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
ask
10.200.140.72:50010 to delete  blk_-2803854957896101600
blk_-1365392011282362536 blk_-9016864291042902362

------- Comment #11
<http://bugzilla.corp.veoh/show_bug.cgi?id=20004#c11>  From Leor Thomas
<mailto:[email protected]>  2009-01-14 07:24:15 [reply
<http://bugzilla.corp.veoh/process_bug.cgi#add_comment> ] ------- 

Found this similar error in the hadoop log from the 12th.  There are
also many
pipeline file exceptions in the log, they maybe ok (ask jo).
 
http://xadoop1:50070/logs/hadoop-hadoop-namenode-xadoop1.els1.log.2009-0
1-12
 
2009-01-12 21:42:26,255 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 10.200.140.72:50010 is
added to
blk_-3434160565482012683 size 67
2009-01-12 21:42:26,336 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6
on 50020, call
getFileInfo(/home/hbase/profile/751151815/target/mapfiles/32735381133438
65674.1721228298/data)
from 10.200.140.72:40236: error:
org.apache.hadoop.fs.permission.AccessControlException: Permission
denied:
user=hbase, access=EXECUTE,
inode="3273538113343865674.1721228298":hbase:hbase:rw-r--r--
org.apache.hadoop.fs.permission.AccessControlException: Permission
denied:
user=hbase, access=EXECUTE,
inode="3273538113343865674.1721228298":hbase:hbase:rw-r--r--
        at
org.apache.hadoop.dfs.PermissionChecker.check(PermissionChecker.java:175
)
        at
org.apache.hadoop.dfs.PermissionChecker.checkTraverse(PermissionChecker.
java:131)
        at
org.apache.hadoop.dfs.PermissionChecker.checkPermission(PermissionChecke
r.java:101)
        at
org.apache.hadoop.dfs.FSNamesystem.checkPermission(FSNamesystem.java:421
4)
        at
org.apache.hadoop.dfs.FSNamesystem.checkTraverse(FSNamesystem.java:4189)
        at
org.apache.hadoop.dfs.FSNamesystem.getFileInfo(FSNamesystem.java:1562)
        at org.apache.hadoop.dfs.NameNode.getFileInfo(NameNode.java:440)
        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
2009-01-12 21:42:26,545 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.allocateBlock:
/home/pipeline/2009/01/12/20/parser/targeting/FileParser.bad.
blk_7288671402489388204
2009-01-12 21:42:27,349 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 10.200.140.57:50010 is
added to
blk_7288671402489388204 size 151904

------- Comment #12
<http://bugzilla.corp.veoh/show_bug.cgi?id=20004#c12>  From Leor Thomas
<mailto:[email protected]>  2009-01-14 07:37:39 [reply
<http://bugzilla.corp.veoh/process_bug.cgi#add_comment> ] ------- 

This seems like the original HBase error, ~9:30pm Jan 12
 
http://hadoop28.els1.veoh.com:60030/logs/hbase-hbase-regionserver-hadoop
28.els1.log.2009-01-12
 
2009-01-12 21:40:58,345 INFO org.apache.hadoop.hbase.regionserver.HLog:
New log
writer created at
/home/hbase/log_10.200.140.72_1225758739209_60020/hlog.dat.1231825258323
2009-01-12 21:40:58,345 INFO org.apache.hadoop.hbase.regionserver.HLog:
removing old log file
/home/hbase/log_10.200.140.72_1225758739209_60020/hlog.dat.1231819870115
whose
highest sequence/edit id is 3781258
2009-01-12 21:42:21,737 INFO
org.apache.hadoop.hbase.regionserver.HRegion:
starting compaction on region profile,,1228188068084
2009-01-12 21:42:24,960 INFO
org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region profile,,1228188068084 in 3sec
2009-01-12 21:42:24,960 INFO
org.apache.hadoop.hbase.regionserver.HRegion:
Starting split of region profile,,1228188068084
2009-01-12 21:42:24,985 INFO
org.apache.hadoop.hbase.regionserver.HRegion:
closed profile,,1228188068084
2009-01-12 21:42:26,351 ERROR
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
Compaction/Split
failed for region profile,,1228188068084
org.apache.hadoop.fs.permission.AccessControlException:
org.apache.hadoop.fs.permission.AccessControlException: Permission
denied:
user=hbase, access=EXECUTE,
inode="3273538113343865674.1721228298":hbase:hbase:rw-r--r--
        at
org.apache.hadoop.dfs.PermissionChecker.check(PermissionChecker.java:175
)
 
...
 
Caused by: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.fs.permission.AccessControlException: Permission
denied:
user=hbase, access=EXECUTE,
inode="3273538113343865674.1721228298":hbase:hbase:rw-r--r--
        at
org.apache.hadoop.dfs.PermissionChecker.check(PermissionChecker.java:175
)
 
...

------- Comment #13
<http://bugzilla.corp.veoh/show_bug.cgi?id=20004#c13>  From Leor Thomas
<mailto:[email protected]>  2009-01-14 07:43:10 [reply
<http://bugzilla.corp.veoh/process_bug.cgi#add_comment> ] ------- 

A warn a little higher up in that same hbase log:
 
2009-01-12 15:11:38,252 INFO
org.apache.hadoop.hbase.regionserver.HRegion:
starting compaction on region profile,,1228188068084
2009-01-12 15:11:57,078 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call getRow([...@546997a2, [...@606237f9, null,
9223372036854775807)
from 10.200.140.57:44068: output error
2009-01-12 15:11:57,078 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: doAsyncWrite threw exception java.io.IOException: Connection
reset
by peer
2009-01-12 15:12:10,135 INFO
org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region profile,,1228188068084 in 31sec





 

The data subdir in the exceptions exists in some mapfiles subdirs, but
there are also empty files similarly named in mapfiles, and those do not
have execute permissions and do not have data folders in them.  Here is
an lsr of target dirs:

 

had...@lthomas-lt /cygdrive/c/leor/bin/hadoop-0.17.1

$ bin/hadoop fs -lsr /home/hbase/profile | grep target

/home/hbase/profile/1124966101/target   <dir>           2009-01-13 22:24
rwxr-xr-x       hbase   hbase

/home/hbase/profile/1124966101/target/info      <dir>
2009-01-13 22:24        rwxr-xr-x       hbase   hbase

/home/hbase/profile/1124966101/target/info/1480298320017890631.172122829
8       <r 3>   67      2009-01-13 22:24        rw-r--r--       hbase
hbase

/home/hbase/profile/1124966101/target/mapfiles  <dir>
2009-01-13 22:24        rwxr-xr-x       hbase   hbase

/home/hbase/profile/1124966101/target/mapfiles/1480298320017890631.17212
28298   <r 3>   0       2009-01-13 22:24        rw-r--r--       hbase
hbase

/home/hbase/profile/1721228298/splits/1354500955/target <dir>
2009-01-13 22:24        rwxr-xr-x       hbase   hbase

/home/hbase/profile/1721228298/splits/1354500955/target/info    <dir>
2009-01-13 22:24        rwxr-xr-x       hbase   hbase

/home/hbase/profile/1721228298/splits/1354500955/target/info/53085121702
63958294.1721228298     <r 3>   67      2009-01-13 22:24
rw-r--r--       hbase

/home/hbase/profile/1721228298/splits/1354500955/target/mapfiles
<dir>           2009-01-13 22:24        rwxr-xr-x       hbase   hbase

/home/hbase/profile/1721228298/splits/1354500955/target/mapfiles/5308512
170263958294.1721228298 <r 3>   0       2009-01-13 22:24
rw-r--r--       hbase

/home/hbase/profile/1721228298/target   <dir>           2008-12-01 19:21
rwxr-xr-x       hbase   hbase

/home/hbase/profile/1721228298/target/info      <dir>
2009-01-13 19:14        rwxr-xr-x       hbase   hbase

/home/hbase/profile/1721228298/target/info/6825406745082624757  <r 3>
9       2009-01-13 19:14        rw-r--r--       hbase   hbase

/home/hbase/profile/1721228298/target/mapfiles  <dir>
2009-01-13 19:14        rwxr-xr-x       hbase   hbase

/home/hbase/profile/1721228298/target/mapfiles/6825406745082624757
<dir>           2009-01-13 19:12        rwxr-xr-x       hbase   hbase

/home/hbase/profile/1721228298/target/mapfiles/6825406745082624757/data
<r 3>   268550949       2009-01-13 19:12        rw-r--r--       hbase
hbase

/home/hbase/profile/1721228298/target/mapfiles/6825406745082624757/index
<r 3>   736519  2009-01-13 19:12        rw-r--r--       hbase   hbase

/home/hbase/profile/1721228298/target_archive   <dir>
2008-12-01 19:21        rwxr-xr-x       hbase   hbase

/home/hbase/profile/1721228298/target_archive/info      <dir>
2008-12-01 19:21        rwxr-xr-x       hbase   hbase

/home/hbase/profile/1721228298/target_archive/mapfiles  <dir>
2008-12-01 19:21        rwxr-xr-x       hbase   hbase

/home/hbase/profile/2146100670/target   <dir>           2009-01-13 19:15
rwxr-xr-x       hbase   hbase

/home/hbase/profile/2146100670/target/info      <dir>
2009-01-13 19:15        rwxr-xr-x       hbase   hbase

/home/hbase/profile/2146100670/target/info/7828971234340639717.172122829
8       <r 3>   67      2009-01-13 19:15        rw-r--r--       hbase
hbase

/home/hbase/profile/2146100670/target/mapfiles  <dir>
2009-01-13 19:15        rwxr-xr-x       hbase   hbase

/home/hbase/profile/2146100670/target/mapfiles/7828971234340639717.17212
28298   <r 3>   0       2009-01-13 19:15        rw-r--r--       hbase
hbase

/home/hbase/profile/751151815/target    <dir>           2009-01-12 21:42
rwxrwxr-x       hbase   hbase

/home/hbase/profile/751151815/target/info       <dir>
2009-01-12 21:42        rwxrwxr-x       hbase   hbase

/home/hbase/profile/751151815/target/info/3273538113343865674.1721228298
<r 3>   67      2009-01-12 21:42        rw-rw-rw-       hbase   hbase

/home/hbase/profile/751151815/target/info/3430916177563875594.1721228298
<r 3>   67      2009-01-12 21:42        rw-rw-r--       hbase   hbase

/home/hbase/profile/751151815/target/info/3965813641963086919.1721228298
<r 3>   67      2009-01-12 21:42        rw-rw-r--       hbase   hbase

/home/hbase/profile/751151815/target/mapfiles   <dir>
2009-01-12 21:42        rwxrwxr-x       hbase   hbase

/home/hbase/profile/751151815/target/mapfiles/3273538113343865674.172122
8298    <r 3>   0       2009-01-12 21:42        rw-rw-r--       hbase
hbase

/home/hbase/profile/751151815/target/mapfiles/3430916177563875594.172122
8298    <r 3>   0       2009-01-12 21:42        rw-rw-r--       hbase
hbase

/home/hbase/profile/751151815/target/mapfiles/3965813641963086919.172122
8298    <r 3>   0       2009-01-12 21:42        rw-rw-r--       hbase
hbase

 

had...@lthomas-lt /cygdrive/c/leor/bin/hadoop-0.17.1

Reply via email to