Hi Stack,

Can and will do, but does that make the error go away, i.e. automagically fix it? Or is it broken and nothing can be done about it now?

Lars

stack wrote:
If possible, please move to TRUNK instead. Most of the below have been addressed there (I can send you a patch if you want to run hbase TRUNK on hadoop 0.15.x).

Further comments inline below:


Lars George wrote:
Hi Stack,

Yes, it happens every time I insert particular rows. Before it would fail every now and so often, but since now all "good" rows are inserted I am stuck with the ones that do not insert. And I am sure they did once, with no error. So they are in there in limbo, but I cannot retrieve nor delete or insert them.

It mentions in the FAQ that I can switch on debugging through the UI, but I cannot see where. I am using version 0.15.1, is that supposed to have that option or do I need to go the log4j.properties plus restart route?

This is a post-0.15 release feature (It says post-0.15.x in the FAQ).


I have errors all the time - which quite frankly worry me. Here a list of what I see so far:

1. At startup

==> /usr/local/hadoop/logs/hbase-pdc-regionserver-lv1-xen-pdc-40.worldlingo.com.log <== 2008-01-03 14:11:22,512 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2008-01-03 14:11:29,808 WARN org.apache.hadoop.hbase.HRegionServer: Processing message (Retry: 0) java.io.IOException: java.io.IOException: java.util.ConcurrentModificationException
....


Fixed in TRUNK




2. Sporadically

2008-01-03 21:32:00,639 WARN org.apache.hadoop.dfs.DataNode: Unexpected error trying to delete block blk_-8931657506153335343. Block not found in blockMap. 2008-01-03 21:32:00,639 WARN org.apache.hadoop.dfs.DataNode: Unexpected error trying to delete block blk_3775459202881005176. Block not found in blockMap. 2008-01-03 21:32:00,639 WARN org.apache.hadoop.dfs.DataNode: Unexpected error trying to delete block blk_-283089329129695997. Block not found in blockMap. 2008-01-03 21:32:00,644 WARN org.apache.hadoop.dfs.DataNode: java.io.IOException: Error in deleting blocks.
       at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:719)
at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:625)
       at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:528)
       at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1494)
       at java.lang.Thread.run(Thread.java:595)

These come with all sorts of blocks, they do not appear too often, but on a regular basis.

0.15.x was doing updates against files that had been removed by another thread. Made for strange errors in hdfs. That said, I don't recall having seen the above. Do you have namenode at DEBUG level? If so, try tracing the above problematic blocks therein; see if you can figure out a story as to what happened with these blocks.

If HDFS is in an inconsistent state, hbase will be inconsistent too.

3. Misc

I see these too from this morning

2008-01-04 08:23:30,616 ERROR org.apache.hadoop.hbase.HRegionServer: unable to process message: MSG_REGION_OPEN : regionname: do cs,DC20020096869_20020725,43610073395851568, startKey: <DC20020096869_20020725>, tableDesc: {name: docs, families: {contents:={n ame: contents, max versions: 3, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, language:={name: l anguage, max versions: 3, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, mimetype:={name: mimetyp e, max versions: 3, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} java.io.IOException: java.io.IOException: Cannot open filename /hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/co
ntents/info/1501965039462307633
       at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
       at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:585)
       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:494) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:48) at org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:903)
       at java.lang.Thread.run(Thread.java:595)



I'd guess this file is made of the above cited problematic blocks. Can you find it on hdfs? Can you download it? (Try doing a './bin/hadoop fs fsck /HBASE_DIR' -- see what it says).



Another one is this:

==> /usr/local/hadoop/logs/hbase-pdc-regionserver-lv1-xen-pdc-62.worldlingo.com.log <== 2008-01-04 08:16:32,001 WARN org.apache.hadoop.hbase.HRegion: Region docs,DC20020099792_20020725,9149203683830573099 is NOT spli
table though its aggregate size is 111.4m and desired size is 64.0m

These come up with different region numbers.

This is ok. We want to split the region because its > 64MB but this region has outstanding references to another, parent region, so is not yet splittable (It should have turned splittable a little later in your log).


And another one:

2008-01-03 11:27:55,437 WARN org.apache.hadoop.hbase.HStore: Failed getting store size org.apache.hadoop.ipc.RemoteException: java.io.IOException: File does not exist at org.apache.hadoop.dfs.FSDirectory.getFileInfo(FSDirectory.java:489) at org.apache.hadoop.dfs.FSNamesystem.getFileInfo(FSNamesystem.java:1360)
       at org.apache.hadoop.dfs.NameNode.getFileInfo(NameNode.java:428)
       at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:585)
       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

Fixed in TRUNK



And another one:

2008-01-03 15:43:39,590 WARN org.apache.hadoop.dfs.DataNode: Got exception while serving blk_3676251342939485484 to /192.168.105.21:
java.io.IOException: Block blk_3676251342939485484 is not valid.
at org.apache.hadoop.dfs.FSDataset.getBlockFile(FSDataset.java:528) at org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1051) at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:843) at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:801)
       at java.lang.Thread.run(Thread.java:595)

Again, they come with different block numbers.

This hdfs exception may be recoverable IIRC; hdfs gets the block elsewhere.


4. Inserting document with errors

If I try to add one of those documents, where I get an error back, this is what I see in the logs so far:

Uploading -> DC20020099841_20020725

Server 20:


....
2007-12-30 17:26:42,392 INFO org.mortbay.util.Container: Started HttpContext[/static,/static] 2007-12-30 17:26:42,395 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:60030 2007-12-30 17:26:42,395 INFO org.mortbay.util.Container: Started [EMAIL PROTECTED] 2007-12-30 17:26:42,396 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 60020: starting 2007-12-30 17:26:42,397 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 60020: starting 2007-12-30 17:26:42,402 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 60020: starting 2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 60020: starting 2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 60020: starting 2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 60020: starting 2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 60020: starting



Please set these to run at DEBUG level, at least while we are trying to figure out whats going on.


Server 26:

2008-01-04 12:18:28,125 WARN org.apache.hadoop.hbase.HRegionServer: java.io.IOException: java.io.IOException: Cannot open filename / hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/contents/info/1501965039462307633
       at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
       at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:585)
       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

2008-01-04 12:18:28,131 WARN org.apache.hadoop.hbase.HRegionServer: java.io.IOException: java.io.IOException: Cannot open filename / hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/contents/info/1501965039462307633
       at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
       at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:585)
       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

2008-01-04 12:18:28,147 WARN org.apache.hadoop.hbase.HRegionServer: java.io.IOException: java.io.IOException: Cannot open filename / hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/contents/info/1501965039462307633
       at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
       at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:585)
       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

The above are bad errors. hdfs lost our files or, more likely, we mangled their writing.

...

Does this help?


Yes.

St.Ack


Thanks,
Lars



stack wrote:
Lars George wrote:
Hi,

I have inserted about 3.5m documents in a single two column table in HBase running on 32 nodes. So far I was able to insert most data, but with the last million or so I am stuck with this error:

org.apache.hadoop.hbase.WrongRegionException: Requested row out of range for HRegion docs,DC20020099792_20020725,9149203683830573099, startKey='DC20020099792_20020725', endKey='DC20020099792_20020725', row='DC20020099841_20020725'

This happens every time you try to do an insert?

Querying for the document returns nothing, means it looks like the document does not exist - although I am sure I tried inserting it a few times. Deleting or trying to re-insert returns the above error, both through the API (using HTable) as well as through the HBase shell.

I tried a restart of Hadoop/HBase to no avail. How do fix this problem? Any help is appreciated.

You have DEBUG enabled for hbase (See http://wiki.apache.org/lucene-hadoop/Hbase/FAQ#4). Do the logs tell you anything more: e.g. any interesting exceptions?

Which hbase version?

St.Ack


Reply via email to