[
https://issues.apache.org/jira/browse/HBASE-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653587#action_12653587
]
Andrew Purtell commented on HBASE-1045:
---------------------------------------
Based on the regionserver log in question it looks like the shutdown thread
completed quickly after OOME (in server select in IPC). So for some reason the
region was not reassigned in a timely manner. Clients kept finding the old
region information in meta and went to the server but there was no listener on
the socket, hence:
SEVERE: Failed write of Record .... java.io.IOException: Call to
10.30.94.37:60020 failed on local exception: Connection refused.
I did not have our service recovery framework running. Otherwise another region
server would have been launched and would have thrown NSREs (eventually).
> Hangup by regionserver causes write to fail
> -------------------------------------------
>
> Key: HBASE-1045
> URL: https://issues.apache.org/jira/browse/HBASE-1045
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Reporter: Andrew Purtell
> Fix For: 0.19.0
>
>
> Root cause is OOME on the region server. Nonetheless a hangup during IPC
> causes the client to fail the write, currently causing data loss. Should the
> application catch and retry? Or should the client libraries try harder?
> Dec 4, 2008 5:25:30 PM com.powerset.heritrix.writer.HBaseWriterProcessor
> innerProcessResult
> SEVERE: Failed write of Record: http://www.publicrecordslocal.com/georgia.htm
> (in thread 'ToeThread #9: http://www.publicrecordslocal.com/georgia.htm'; in
> processor 'Archiver')
> java.io.IOException: java.io.IOException: Call to /10.30.94.38:60020 failed
> on local exception: Connection refused
> at com.powerset.heritrix.writer.HBaseWriter.write(Unknown Source)
> at com.powerset.heritrix.writer.HBaseWriterProcessor.write(Unknown
> Source)
> at
> com.powerset.heritrix.writer.HBaseWriterProcessor.innerProcessResult(Unknown
> Source)
> at org.archive.modules.Processor.process(Processor.java:123)
> at
> org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:310)
> at org.archive.crawler.framework.ToeThread.run(ToeThread.java:157)
> Caused by: java.io.IOException: Call to /10.30.94.38:60020 failed on local
> exception: Connection refused
> at org.apache.hadoop.ipc.Client.call(Client.java:699)
> at
> org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:323)
> at $Proxy12.batchUpdates(Unknown Source)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.call(HConnectionManager.java:919)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.call(HConnectionManager.java:917)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerForWithoutRetries(HConnectionManager.java:875)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:916)
> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1267)
> at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1238)
> at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1218)
> at net.iridiant.content.Content.storeURLInfo(Unknown Source)
> ... 6 more
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
> at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:299)
> at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:772)
> at org.apache.hadoop.ipc.Client.call(Client.java:685)
> ... 16 more
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.