[
https://issues.apache.org/jira/browse/HBASE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366299#comment-14366299
]
Jonathan Lawlor commented on HBASE-13262:
-----------------------------------------
I think I may have found the issue, and it looks like it is being caused by a
mismatch of the result size estimate server side versus client side...
Some background:
* In a scan, the method Scan#setMaxResultSize is exposed to allow the user to
restrict the size (in memory) of the Results returned from the server.
* To enforce this size limit, the server calculates the size of each Result
prior to adding it to the list of Results it will send back to the client.
** In 0.98, this size calculation is performed on line 3238 inside
HRegionServer.java
** In branch-1.0 this size calculation is performed on line 2099 inside
RSRpcServices.java ....
* When the client receives the Results back from the server, it repeats the
size calculation.
** The client does this so that it can infer whether or not this response was
sent back because the size limit was reached (in which case there may be more
Results within the current region that still need to be scanned).
** If the client calculates the size of the Results and it sees that the size
limit has NOT been hit, it assumes that the current region has been exhausted,
and will move the scanner to the next region
The problem here is that there is an implied relationship between the size
calculation on the server and the size calculation on the client: The two sizes
MUST be equal. If the server reaches the size limit, it is implied that the
client should also reach the size limit.... This is where the issue occurs.
In 0.98 I added some logs and it looks like this relationship is always true.
Specifically, the size calculated by the server is always equal to the size
calculated by the client. However, in this test case, this is not true in
branch-1.0+. What I see instead is that the size calculated by the server is
LARGER than the size calculated by the client. The net effect is that the
client checks its size limit and sees that the limit has not been reached, so
it assumes that the region has been exhausted and moves the scanner to the next
region... so as [~elserj] predicted, the root cause is that we jump between
regions too soon....
It looks like the root cause of this issue is due to the fact that the
implementation of the method that is used to calculate the Result size changed
between 0.98 and branch-1.0.
I am attaching two test run outputs with some added logging. The way to
interpret the output is as follows:
* A log was added on the server to log when the result size and number of rows
being returned (this is seen as a log from HRegionServer in 0.98 and
RSRpcServices in branch-1.0)
* A log was added on the client to log the remaining result size and also
whether or not the size or caching limit has been reached (seen as a log from
ClientScanner)
We expect the two result sizes to be equal.. in the 0.98 you can see they are
equal, branch-1.0 the result size on the server is larger causing us to skip to
the next region too early. The logging was a little rough, so if anything needs
clarification please let me know.
> ResultScanner doesn't return all rows in Scan
> ---------------------------------------------
>
> Key: HBASE-13262
> URL: https://issues.apache.org/jira/browse/HBASE-13262
> Project: HBase
> Issue Type: Bug
> Components: Client
> Affects Versions: 2.0.0, 1.1.0
> Environment: Single node, pseduo-distributed 1.1.0-SNAPSHOT
> Reporter: Josh Elser
> Assignee: Josh Elser
> Priority: Blocker
> Fix For: 2.0.0, 1.1.0
>
>
> Tried to write a simple Java client again 1.1.0-SNAPSHOT.
> * Write 1M rows, each row with 1 family, and 10 qualifiers (values [0-9]),
> for a total of 10M cells written
> * Read back the data from the table, ensure I saw 10M cells
> Running it against {{04ac1891}} (and earlier) yesterday, I would get ~20% of
> the actual rows. Running against 1.0.0, returns all 10M records as expected.
> [Code I was
> running|https://github.com/joshelser/hbase-hwhat/blob/master/src/main/java/hbase/HBaseTest.java]
> for the curious.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)