[ 
https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432941#comment-13432941
 ] 

Lars Hofhansl commented on HBASE-6561:
--------------------------------------

A little more detail. When this happens I see the following pattern:

# a seek to column X (at readpoint 1)
# all versions of *all* columns > X only have a readpoint > 1, hence all need 
to be skipped
# goto 1. with column X+1, still at readpoint 1, until we exhausted all columns

So for each KVs we reseek to, we skip over all KVs larger than this KV. This 
leads to many millions (hundreds of millions) of KVs that are needlessly 
skipped multiple times.
MemStoreScanner.getNext() simply does not find a single KV with the right 
readpoint and iterates all the way to end (and does so again for each reseek).

This is very pathological scenario. Somehow a previous Get is not finished 
before the next Put inserts (see the sample code in pastebin in the 
description), which seems impossible.

                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to