Re: WrongRegionException - How do I fix it?

Lars George Fri, 04 Jan 2008 15:11:17 -0800

Hi Stack,

Can and will do, but does that make the error go away, i.e.automagically fix it? Or is it broken and nothing can be done about it now?


Lars

stack wrote:

If possible, please move to TRUNK instead. Most of the below havebeen addressed there (I can send you a patch if you want to run hbaseTRUNK on hadoop 0.15.x).
Further comments inline below:


Lars George wrote:
Hi Stack,
Yes, it happens every time I insert particular rows. Before it wouldfail every now and so often, but since now all "good" rows areinserted I am stuck with the ones that do not insert. And I am surethey did once, with no error. So they are in there in limbo, but Icannot retrieve nor delete or insert them.
It mentions in the FAQ that I can switch on debugging through the UI,but I cannot see where. I am using version 0.15.1, is that supposedto have that option or do I need to go the log4j.properties plusrestart route?
This is a post-0.15 release feature (It says post-0.15.x in the FAQ).
I have errors all the time - which quite frankly worry me. Here alist of what I see so far:
1. At startup
==>/usr/local/hadoop/logs/hbase-pdc-regionserver-lv1-xen-pdc-40.worldlingo.com.log<==2008-01-03 14:11:22,512 WARN org.apache.hadoop.util.NativeCodeLoader:Unable to load native-hadoop library for your platform... usingbuiltin-java classes where applicable2008-01-03 14:11:29,808 WARN org.apache.hadoop.hbase.HRegionServer:Processing message (Retry: 0)java.io.IOException: java.io.IOException:java.util.ConcurrentModificationException
....


Fixed in TRUNK
2. Sporadically
2008-01-03 21:32:00,639 WARN org.apache.hadoop.dfs.DataNode:Unexpected error trying to delete block blk_-8931657506153335343.Block not found in blockMap.2008-01-03 21:32:00,639 WARN org.apache.hadoop.dfs.DataNode:Unexpected error trying to delete block blk_3775459202881005176.Block not found in blockMap.2008-01-03 21:32:00,639 WARN org.apache.hadoop.dfs.DataNode:Unexpected error trying to delete block blk_-283089329129695997.Block not found in blockMap.2008-01-03 21:32:00,644 WARN org.apache.hadoop.dfs.DataNode:java.io.IOException: Error in deleting blocks.
       at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:719)
atorg.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:625)
       at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:528)
       at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1494)
       at java.lang.Thread.run(Thread.java:595)
These come with all sorts of blocks, they do not appear too often,but on a regular basis.
0.15.x was doing updates against files that had been removed byanother thread. Made for strange errors in hdfs. That said, I don'trecall having seen the above. Do you have namenode at DEBUG level?If so, try tracing the above problematic blocks therein; see if youcan figure out a story as to what happened with these blocks.
If HDFS is in an inconsistent state, hbase will be inconsistent too.
3. Misc

I see these too from this morning
2008-01-04 08:23:30,616 ERROR org.apache.hadoop.hbase.HRegionServer:unable to process message: MSG_REGION_OPEN : regionname: docs,DC20020096869_20020725,43610073395851568, startKey:<DC20020096869_20020725>, tableDesc: {name: docs, families:{contents:={name: contents, max versions: 3, compression: NONE, in memory: false,max length: 2147483647, bloom filter: none}, language:={name: language, max versions: 3, compression: NONE, in memory: false, maxlength: 2147483647, bloom filter: none}, mimetype:={name: mimetype, max versions: 3, compression: NONE, in memory: false, max length:2147483647, bloom filter: none}}}java.io.IOException: java.io.IOException: Cannot open filename/hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/co
ntents/info/1501965039462307633
       at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
       at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:585)
       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
atsun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source)atsun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)atjava.lang.reflect.Constructor.newInstance(Constructor.java:494)atorg.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)atorg.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:48)atorg.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:903)
       at java.lang.Thread.run(Thread.java:595)
I'd guess this file is made of the above cited problematic blocks.Can you find it on hdfs? Can you download it? (Try doing a'./bin/hadoop fs fsck /HBASE_DIR' -- see what it says).
Another one is this:
==>/usr/local/hadoop/logs/hbase-pdc-regionserver-lv1-xen-pdc-62.worldlingo.com.log<==2008-01-04 08:16:32,001 WARN org.apache.hadoop.hbase.HRegion: Regiondocs,DC20020099792_20020725,9149203683830573099 is NOT spli
table though its aggregate size is 111.4m and desired size is 64.0m

These come up with different region numbers.
This is ok. We want to split the region because its > 64MB but thisregion has outstanding references to another, parent region, so is notyet splittable (It should have turned splittable a little later inyour log).
And another one:
2008-01-03 11:27:55,437 WARN org.apache.hadoop.hbase.HStore: Failedgetting store sizeorg.apache.hadoop.ipc.RemoteException: java.io.IOException: File doesnot existatorg.apache.hadoop.dfs.FSDirectory.getFileInfo(FSDirectory.java:489)atorg.apache.hadoop.dfs.FSNamesystem.getFileInfo(FSNamesystem.java:1360)
       at org.apache.hadoop.dfs.NameNode.getFileInfo(NameNode.java:428)
       at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:585)
       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
Fixed in TRUNK
And another one:
2008-01-03 15:43:39,590 WARN org.apache.hadoop.dfs.DataNode: Gotexception while serving blk_3676251342939485484 to /192.168.105.21:
java.io.IOException: Block blk_3676251342939485484 is not valid.
atorg.apache.hadoop.dfs.FSDataset.getBlockFile(FSDataset.java:528)atorg.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1051)atorg.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:843)atorg.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:801)
       at java.lang.Thread.run(Thread.java:595)

Again, they come with different block numbers.
This hdfs exception may be recoverable IIRC; hdfs gets the blockelsewhere.
4. Inserting document with errors
If I try to add one of those documents, where I get an error back,this is what I see in the logs so far:
Uploading -> DC20020099841_20020725

Server 20:
....
2007-12-30 17:26:42,392 INFO org.mortbay.util.Container: StartedHttpContext[/static,/static]2007-12-30 17:26:42,395 INFO org.mortbay.http.SocketListener: StartedSocketListener on 0.0.0.0:600302007-12-30 17:26:42,395 INFO org.mortbay.util.Container: Started[EMAIL PROTECTED]2007-12-30 17:26:42,396 INFO org.apache.hadoop.ipc.Server: IPC Serverlistener on 60020: starting2007-12-30 17:26:42,397 INFO org.apache.hadoop.ipc.Server: IPC Serverhandler 0 on 60020: starting2007-12-30 17:26:42,402 INFO org.apache.hadoop.ipc.Server: IPC Serverhandler 1 on 60020: starting2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Serverhandler 3 on 60020: starting2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Serverhandler 4 on 60020: starting2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Serverhandler 5 on 60020: starting2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Serverhandler 6 on 60020: starting
Please set these to run at DEBUG level, at least while we are tryingto figure out whats going on.
Server 26:
2008-01-04 12:18:28,125 WARN org.apache.hadoop.hbase.HRegionServer:java.io.IOException: java.io.IOException: Cannot open filename /hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/contents/info/1501965039462307633
       at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
       at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:585)
       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
2008-01-04 12:18:28,131 WARN org.apache.hadoop.hbase.HRegionServer:java.io.IOException: java.io.IOException: Cannot open filename /hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/contents/info/1501965039462307633
       at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
       at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:585)
       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
2008-01-04 12:18:28,147 WARN org.apache.hadoop.hbase.HRegionServer:java.io.IOException: java.io.IOException: Cannot open filename /hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/contents/info/1501965039462307633
       at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
       at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:585)
       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
The above are bad errors. hdfs lost our files or, more likely, wemangled their writing.
...
Does this help?
Yes.

St.Ack
Thanks,
Lars



stack wrote:
Lars George wrote:
Hi,
I have inserted about 3.5m documents in a single two column tablein HBase running on 32 nodes. So far I was able to insert mostdata, but with the last million or so I am stuck with this error:
org.apache.hadoop.hbase.WrongRegionException: Requested row out ofrange for HRegion docs,DC20020099792_20020725,9149203683830573099,startKey='DC20020099792_20020725', endKey='DC20020099792_20020725',row='DC20020099841_20020725'
This happens every time you try to do an insert?
Querying for the document returns nothing, means it looks like thedocument does not exist - although I am sure I tried inserting it afew times. Deleting or trying to re-insert returns the above error,both through the API (using HTable) as well as through the HBaseshell.
I tried a restart of Hadoop/HBase to no avail. How do fix thisproblem? Any help is appreciated.
You have DEBUG enabled for hbase (Seehttp://wiki.apache.org/lucene-hadoop/Hbase/FAQ#4). Do the logs tellyou anything more: e.g. any interesting exceptions?
Which hbase version?

St.Ack

Re: WrongRegionException - How do I fix it?

Reply via email to