If possible, please move to TRUNK instead. Most of the below have
been addressed there (I can send you a patch if you want to run hbase
TRUNK on hadoop 0.15.x).
Further comments inline below:
Lars George wrote:
Hi Stack,
Yes, it happens every time I insert particular rows. Before it would
fail every now and so often, but since now all "good" rows are
inserted I am stuck with the ones that do not insert. And I am sure
they did once, with no error. So they are in there in limbo, but I
cannot retrieve nor delete or insert them.
It mentions in the FAQ that I can switch on debugging through the UI,
but I cannot see where. I am using version 0.15.1, is that supposed
to have that option or do I need to go the log4j.properties plus
restart route?
This is a post-0.15 release feature (It says post-0.15.x in the FAQ).
I have errors all the time - which quite frankly worry me. Here a
list of what I see so far:
1. At startup
==>
/usr/local/hadoop/logs/hbase-pdc-regionserver-lv1-xen-pdc-40.worldlingo.com.log
<==
2008-01-03 14:11:22,512 WARN org.apache.hadoop.util.NativeCodeLoader:
Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
2008-01-03 14:11:29,808 WARN org.apache.hadoop.hbase.HRegionServer:
Processing message (Retry: 0)
java.io.IOException: java.io.IOException:
java.util.ConcurrentModificationException
....
Fixed in TRUNK
2. Sporadically
2008-01-03 21:32:00,639 WARN org.apache.hadoop.dfs.DataNode:
Unexpected error trying to delete block blk_-8931657506153335343.
Block not found in blockMap.
2008-01-03 21:32:00,639 WARN org.apache.hadoop.dfs.DataNode:
Unexpected error trying to delete block blk_3775459202881005176.
Block not found in blockMap.
2008-01-03 21:32:00,639 WARN org.apache.hadoop.dfs.DataNode:
Unexpected error trying to delete block blk_-283089329129695997.
Block not found in blockMap.
2008-01-03 21:32:00,644 WARN org.apache.hadoop.dfs.DataNode:
java.io.IOException: Error in deleting blocks.
at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:719)
at
org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:625)
at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:528)
at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1494)
at java.lang.Thread.run(Thread.java:595)
These come with all sorts of blocks, they do not appear too often,
but on a regular basis.
0.15.x was doing updates against files that had been removed by
another thread. Made for strange errors in hdfs. That said, I don't
recall having seen the above. Do you have namenode at DEBUG level?
If so, try tracing the above problematic blocks therein; see if you
can figure out a story as to what happened with these blocks.
If HDFS is in an inconsistent state, hbase will be inconsistent too.
3. Misc
I see these too from this morning
2008-01-04 08:23:30,616 ERROR org.apache.hadoop.hbase.HRegionServer:
unable to process message: MSG_REGION_OPEN : regionname: do
cs,DC20020096869_20020725,43610073395851568, startKey:
<DC20020096869_20020725>, tableDesc: {name: docs, families:
{contents:={n
ame: contents, max versions: 3, compression: NONE, in memory: false,
max length: 2147483647, bloom filter: none}, language:={name: l
anguage, max versions: 3, compression: NONE, in memory: false, max
length: 2147483647, bloom filter: none}, mimetype:={name: mimetyp
e, max versions: 3, compression: NONE, in memory: false, max length:
2147483647, bloom filter: none}}}
java.io.IOException: java.io.IOException: Cannot open filename
/hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/co
ntents/info/1501965039462307633
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
at
sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at
java.lang.reflect.Constructor.newInstance(Constructor.java:494)
at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:48)
at
org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:903)
at java.lang.Thread.run(Thread.java:595)
I'd guess this file is made of the above cited problematic blocks.
Can you find it on hdfs? Can you download it? (Try doing a
'./bin/hadoop fs fsck /HBASE_DIR' -- see what it says).
Another one is this:
==>
/usr/local/hadoop/logs/hbase-pdc-regionserver-lv1-xen-pdc-62.worldlingo.com.log
<==
2008-01-04 08:16:32,001 WARN org.apache.hadoop.hbase.HRegion: Region
docs,DC20020099792_20020725,9149203683830573099 is NOT spli
table though its aggregate size is 111.4m and desired size is 64.0m
These come up with different region numbers.
This is ok. We want to split the region because its > 64MB but this
region has outstanding references to another, parent region, so is not
yet splittable (It should have turned splittable a little later in
your log).
And another one:
2008-01-03 11:27:55,437 WARN org.apache.hadoop.hbase.HStore: Failed
getting store size
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File does
not exist
at
org.apache.hadoop.dfs.FSDirectory.getFileInfo(FSDirectory.java:489)
at
org.apache.hadoop.dfs.FSNamesystem.getFileInfo(FSNamesystem.java:1360)
at org.apache.hadoop.dfs.NameNode.getFileInfo(NameNode.java:428)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
Fixed in TRUNK
And another one:
2008-01-03 15:43:39,590 WARN org.apache.hadoop.dfs.DataNode: Got
exception while serving blk_3676251342939485484 to /192.168.105.21:
java.io.IOException: Block blk_3676251342939485484 is not valid.
at
org.apache.hadoop.dfs.FSDataset.getBlockFile(FSDataset.java:528)
at
org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1051)
at
org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:843)
at
org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:801)
at java.lang.Thread.run(Thread.java:595)
Again, they come with different block numbers.
This hdfs exception may be recoverable IIRC; hdfs gets the block
elsewhere.
4. Inserting document with errors
If I try to add one of those documents, where I get an error back,
this is what I see in the logs so far:
Uploading -> DC20020099841_20020725
Server 20:
....
2007-12-30 17:26:42,392 INFO org.mortbay.util.Container: Started
HttpContext[/static,/static]
2007-12-30 17:26:42,395 INFO org.mortbay.http.SocketListener: Started
SocketListener on 0.0.0.0:60030
2007-12-30 17:26:42,395 INFO org.mortbay.util.Container: Started
[EMAIL PROTECTED]
2007-12-30 17:26:42,396 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 60020: starting
2007-12-30 17:26:42,397 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 60020: starting
2007-12-30 17:26:42,402 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 60020: starting
2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 60020: starting
2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 4 on 60020: starting
2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 60020: starting
2007-12-30 17:26:42,403 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 60020: starting
Please set these to run at DEBUG level, at least while we are trying
to figure out whats going on.
Server 26:
2008-01-04 12:18:28,125 WARN org.apache.hadoop.hbase.HRegionServer:
java.io.IOException: java.io.IOException: Cannot open filename /
hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/contents/info/1501965039462307633
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
2008-01-04 12:18:28,131 WARN org.apache.hadoop.hbase.HRegionServer:
java.io.IOException: java.io.IOException: Cannot open filename /
hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/contents/info/1501965039462307633
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
2008-01-04 12:18:28,147 WARN org.apache.hadoop.hbase.HRegionServer:
java.io.IOException: java.io.IOException: Cannot open filename /
hbase/hregion_docs,DC20020095856_20020725,7894263634108415584/contents/info/1501965039462307633
at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
The above are bad errors. hdfs lost our files or, more likely, we
mangled their writing.
...
Does this help?
Yes.
St.Ack
Thanks,
Lars
stack wrote:
Lars George wrote:
Hi,
I have inserted about 3.5m documents in a single two column table
in HBase running on 32 nodes. So far I was able to insert most
data, but with the last million or so I am stuck with this error:
org.apache.hadoop.hbase.WrongRegionException: Requested row out of
range for HRegion docs,DC20020099792_20020725,9149203683830573099,
startKey='DC20020099792_20020725', endKey='DC20020099792_20020725',
row='DC20020099841_20020725'
This happens every time you try to do an insert?
Querying for the document returns nothing, means it looks like the
document does not exist - although I am sure I tried inserting it a
few times. Deleting or trying to re-insert returns the above error,
both through the API (using HTable) as well as through the HBase
shell.
I tried a restart of Hadoop/HBase to no avail. How do fix this
problem? Any help is appreciated.
You have DEBUG enabled for hbase (See
http://wiki.apache.org/lucene-hadoop/Hbase/FAQ#4). Do the logs tell
you anything more: e.g. any interesting exceptions?
Which hbase version?
St.Ack