[
https://issues.apache.org/jira/browse/HBASE-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658198#action_12658198
]
Jean-Daniel Cryans commented on HBASE-1045:
-------------------------------------------
Currently the way the retries are handled with the batched updates is that it
will get retried infinitely with 1 second between each call to a HRS. This hack
is there until we can refactor the handling of getting/updating batches of
rows, What happened to Andrew is that he got an IOE after 100 and 200+ NSRE
which are always retried until an IOE was thrown which, currently, is not catch.
What we could do to get the situation better :
- Add a max number of retries and use the exponential backoff. This would
worsen the hack by duplicating the HCM stuff in HTable but we will refactor it
for 0.20.
- Catch the IOE now that it won't get retried forever.
- If we are committing a batch of rows, we should do the retries with only 1
row so that we don't OOME region servers.
This way, a single row commit would at least know what failed.
And a nice to have to be to inform HMaster that something is wrong with a HRS.
> Hangup by regionserver causes write to fail
> -------------------------------------------
>
> Key: HBASE-1045
> URL: https://issues.apache.org/jira/browse/HBASE-1045
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Reporter: Andrew Purtell
> Fix For: 0.19.0
>
>
> Root cause is OOME on the region server. Nonetheless a hangup during IPC
> causes the client to fail the write, currently causing data loss. Should the
> application catch and retry? Or should the client libraries try harder?
> Dec 4, 2008 5:25:30 PM com.powerset.heritrix.writer.HBaseWriterProcessor
> innerProcessResult
> SEVERE: Failed write of Record: http://www.publicrecordslocal.com/georgia.htm
> (in thread 'ToeThread #9: http://www.publicrecordslocal.com/georgia.htm'; in
> processor 'Archiver')
> java.io.IOException: java.io.IOException: Call to /10.30.94.38:60020 failed
> on local exception: Connection refused
> at com.powerset.heritrix.writer.HBaseWriter.write(Unknown Source)
> at com.powerset.heritrix.writer.HBaseWriterProcessor.write(Unknown
> Source)
> at
> com.powerset.heritrix.writer.HBaseWriterProcessor.innerProcessResult(Unknown
> Source)
> at org.archive.modules.Processor.process(Processor.java:123)
> at
> org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:310)
> at org.archive.crawler.framework.ToeThread.run(ToeThread.java:157)
> Caused by: java.io.IOException: Call to /10.30.94.38:60020 failed on local
> exception: Connection refused
> at org.apache.hadoop.ipc.Client.call(Client.java:699)
> at
> org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:323)
> at $Proxy12.batchUpdates(Unknown Source)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.call(HConnectionManager.java:919)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.call(HConnectionManager.java:917)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerForWithoutRetries(HConnectionManager.java:875)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:916)
> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1267)
> at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1238)
> at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1218)
> at net.iridiant.content.Content.storeURLInfo(Unknown Source)
> ... 6 more
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
> at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:299)
> at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:772)
> at org.apache.hadoop.ipc.Client.call(Client.java:685)
> ... 16 more
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.