[
https://issues.apache.org/jira/browse/HBASE-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672469#action_12672469
]
Andrew Purtell commented on HBASE-866:
--------------------------------------
Here is a Hertitrix hbase-writer thread stuck in RPC for over an hour:
[ToeThread #96: http://esinidc.onlinedown.net/down/NXSetup.zip
CrawlURI http://esinidc.onlinedown.net/down/NXSetup.zip LX
http://www.onlinedown.net/soft/2919.htm 0 attempts
in processor: Archiver
ACTIVE for 1h21m41s176ms
step: ABOUT_TO_BEGIN_PROCESSOR for 1h21m2s77ms
Java Thread State: WAITING
Blocked/Waiting On: NONE
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:695)
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
$Proxy12.batchUpdates(Unknown Source)
org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.call(HConnectionManager.java:953)
org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.call(HConnectionManager.java:951)
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:845)
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:950)
org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
net.iridiant.heritrix.writer.HBaseWriter.write(Unknown Source)
net.iridiant.heritrix.writer.HBaseWriterProcessor.write(Unknown Source)
net.iridiant.heritrix.writer.HBaseWriterProcessor.innerProcessResult(Unknown
Source)
org.archive.modules.Processor.process(Processor.java:123)
org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:310)
org.archive.crawler.framework.ToeThread.run(ToeThread.java:157)
]
I have a pool of 100 TOE threads all stuck in Archiver, a few in RPC, but only
on one node. There are two other nodes still writing at full speed into HBase.
Intermittent synchronization/wake problem on the write path? Needs
investigation.
> Blocking for ten minutes at a time
> ----------------------------------
>
> Key: HBASE-866
> URL: https://issues.apache.org/jira/browse/HBASE-866
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Priority: Minor
> Attachments: thread_dump.txt
>
>
> I've been testing running biggish MR jobs uploading into hbase. My jobs
> consistently fail with child task timing out its ten minute period. Adding
> logging, was able to see that we're actual stuck in a commit. Following the
> thread of the row we're committing, I see this in the log:
> {code}
> ...
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Flush requested on TestTable,0029377106,1220466998108
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Started memcache flush for region TestTable,0029377106,1220466998108. Current
> region memcache size 64.0m
> 2008-09-03 18:37:03,446 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Blocking updates for 'IPC Server handler 1 on 60020' on region
> TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking
> 64.0m size
> 2008-09-03 18:37:13,450 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server
> handler 1 on 60020'
> 2008-09-03 18:37:16,089 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Blocking updates for 'IPC Server handler 16 on 60020' on region
> TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking
> 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Blocking updates for 'IPC Server handler 1 on 60020' on region
> TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking
> 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Blocking updates for 'IPC Server handler 4 on 60020' on region
> TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking
> 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Blocking updates for 'IPC Server handler 6 on 60020' on region
> TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking
> 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Blocking updates for 'IPC Server handler 2 on 60020' on region
> TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking
> 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Blocking updates for 'IPC Server handler 12 on 60020' on region
> TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking
> 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Blocking updates for 'IPC Server handler 9 on 60020' on region
> TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking
> 64.0m size
> 2008-09-03 18:37:16,091 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Blocking updates for 'IPC Server handler 7 on 60020' on region
> TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking
> 64.0m size
> 2008-09-03 18:37:21,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Finished memcache flush for region TestTable,0029377106,1220466998108 in
> 18538ms, sequence id=2852547, compaction requested=false
> 2008-09-03 18:47:06,241 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Started memcache flush for region TestTable,0029377106,1220466998108. Current
> region memcache size 64.0m
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Finished memcache flush for region TestTable,0029377106,1220466998108 in
> 3790ms, sequence id=2919208, compaction requested=true
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server
> handler 9 on 60020'
> 2008-09-03 18:47:10,031 DEBUG
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested
> for region: TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server
> handler 12 on 60020'
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting compaction on region TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server
> handler 7 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server
> handler 6 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server
> handler 4 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server
> handler 2 on 60020'
> 2008-09-03 18:47:10,037 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server
> handler 16 on 60020'
> 2008-09-03 18:47:10,043 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server
> handler 1 on 60020'
> 2008-09-03 18:47:18,403 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region TestTable,0029377106,1220466998108 in 8sec
> ...
> {code}
> Notice how we're blocked for ten minutes until new flush runs. My guess is
> that the flush that is going on concurrent with the blocking is clearing the
> flag
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.