[ 
https://issues.apache.org/jira/browse/HBASE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366299#comment-14366299
 ] 

Jonathan Lawlor commented on HBASE-13262:
-----------------------------------------

I think I may have found the issue, and it looks like it is being caused by a 
mismatch of the result size estimate server side versus client side...

Some background:
* In a scan, the method Scan#setMaxResultSize is exposed to allow the user to 
restrict the size (in memory) of the Results returned from the server. 
* To enforce this size limit, the server calculates the size of each Result 
prior to adding it to the list of Results it will send back to the client.
** In 0.98, this size calculation is performed on line 3238 inside 
HRegionServer.java
** In branch-1.0 this size calculation is performed on line 2099 inside 
RSRpcServices.java ....
* When the client receives the Results back from the server, it repeats the 
size calculation. 
** The client does this so that it can infer whether or not this response was 
sent back because the size limit was reached (in which case there may be more 
Results within the current region that still need to be scanned). 
** If the client calculates the size of the Results and it sees that the size 
limit has NOT been hit, it assumes that the current region has been exhausted, 
and will move the scanner to the next region

The problem here is that there is an implied relationship between the size 
calculation on the server and the size calculation on the client: The two sizes 
MUST be equal. If the server reaches the size limit, it is implied that the 
client should also reach the size limit.... This is where the issue occurs.

In 0.98 I added some logs and it looks like this relationship is always true. 
Specifically, the size calculated by the server is always equal to the size 
calculated by the client. However, in this test case, this is not true in 
branch-1.0+. What I see instead is that the size calculated by the server is 
LARGER than the size calculated by the client. The net effect is that the 
client checks its size limit and sees that the limit has not been reached, so 
it assumes that the region has been exhausted and moves the scanner to the next 
region... so as [~elserj] predicted, the root cause is that we jump between 
regions too soon....

It looks like the root cause of this issue is due to the fact that the 
implementation of the method that is used to calculate the Result size changed 
between 0.98 and branch-1.0. 

I am attaching two test run outputs with some added logging. The way to 
interpret the output is as follows:
* A log was added on the server to log when the result size and number of rows 
being returned (this is seen as a log from HRegionServer in 0.98 and 
RSRpcServices in branch-1.0)
* A log was added on the client to log the remaining result size and also 
whether or not the size or caching limit has been reached (seen as a log from 
ClientScanner)

We expect the two result sizes to be equal.. in the 0.98 you can see they are 
equal, branch-1.0 the result size on the server is larger causing us to skip to 
the next region too early. The logging was a little rough, so if anything needs 
clarification please let me know.

> ResultScanner doesn't return all rows in Scan
> ---------------------------------------------
>
>                 Key: HBASE-13262
>                 URL: https://issues.apache.org/jira/browse/HBASE-13262
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.0.0, 1.1.0
>         Environment: Single node, pseduo-distributed 1.1.0-SNAPSHOT
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Blocker
>             Fix For: 2.0.0, 1.1.0
>
>
> Tried to write a simple Java client again 1.1.0-SNAPSHOT.
> * Write 1M rows, each row with 1 family, and 10 qualifiers (values [0-9]), 
> for a total of 10M cells written
> * Read back the data from the table, ensure I saw 10M cells
> Running it against {{04ac1891}} (and earlier) yesterday, I would get ~20% of 
> the actual rows. Running against 1.0.0, returns all 10M records as expected.
> [Code I was 
> running|https://github.com/joshelser/hbase-hwhat/blob/master/src/main/java/hbase/HBaseTest.java]
>  for the curious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to