[ 
https://issues.apache.org/jira/browse/HBASE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855648#action_12855648
 ] 

stack commented on HBASE-2248:
------------------------------

So, its not a lockup, rather, stuff is working but really, really slow.  It 
seems to be this patch because going back to a clean hadoop 0.20.2 and the 
current state of pre_durability branch, all runs fine again (until we do an 
actual deadlock, i.e. the known deadlock issue).  I'll spend more time trying 
to figure it but here is how it looks when you thread dump:

Most threads are 'WAITING', etc. and then a good few are like the below BLOCKED:

{code}
"IPC Server handler 36 on 60020" daemon prio=10 tid=0x273e4400 nid=0x2888 
waiting for monitor entry [0x257ad000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:646)
        - waiting to lock <0x3d5ec6a0> (a java.lang.Object)
{code}

Invariabley, there is one 'abnormal' BLOCKED that is the same as above -- 
excepting thread names etc. -- in all except 'waiting to lock' is instead 
'locked' -- same line number and everything.

I'll keep digging.

> Provide new non-copy mechanism to assure atomic reads in get and scan
> ---------------------------------------------------------------------
>
>                 Key: HBASE-2248
>                 URL: https://issues.apache.org/jira/browse/HBASE-2248
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.3
>            Reporter: Dave Latham
>            Priority: Blocker
>             Fix For: 0.20.4
>
>         Attachments: HBASE-2248-demonstrate-previous-impl-bugs.patch, 
> HBASE-2248-GetsAsScans3.patch, HBASE-2248-rr-alpha3.txt, 
> HBASE-2248-rr-pre-durability2.txt, HBASE-2248-rr-pre-durability3.txt, 
> hbase-2248.gc, HBASE-2248.patch, hbase-2248.txt, readownwrites-lost.2.patch, 
> readownwrites-lost.patch, Screen shot 2010-02-23 at 10.33.38 AM.png, 
> threads.txt
>
>
> HBASE-2037 introduced a new MemStoreScanner which triggers a 
> ConcurrentSkipListMap.buildFromSorted clone of the memstore and snapshot when 
> starting a scan.
> After upgrading to 0.20.3, we noticed a big slowdown in our use of short 
> scans.  Some of our data repesent a time series.   The data is stored in time 
> series order, MR jobs often insert/update new data at the end of the series, 
> and queries usually have to pick up some or all of the series.  These are 
> often scans of 0-100 rows at a time.  To load one page, we'll observe about 
> 20 such scans being triggered concurrently, and they take 2 seconds to 
> complete.  Doing a thread dump of a region server shows many threads in 
> ConcurrentSkipListMap.biuldFromSorted which traverses the entire map of key 
> values to copy it.  

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to