[
https://issues.apache.org/jira/browse/HBASE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368375#comment-14368375
]
Jonathan Lawlor commented on HBASE-13262:
-----------------------------------------
bq. The client ultimately requests the server return a batch of size
'hbase.client.scanner.max.result.size' and then believe that the server
returned less data than that limit.
Exactly correct. The client looks at the Results returned from the server and
from its point of view it sees that neither the maxResultSize or caching limit
has been reached. The only explanation it can come up with as to why the server
would return these Results is that it must have exhausted the region (otherwise
it has no reason to stop accumulating Results). But the server stopped because
from its PoV the size limit was reached. There is a miscommunication
bq. I still don't completely understand what is causing the difference on the
server-side in the first place (over 0.98)
Ya, it's a little cryptic because the exact same function is used to calculate
the size server side and client side. I would recommend adding some logs that
allows you to see the estimatedHeapSize of a cell server side versus client
side and see where they differ. My guess would be that somehow the Cell on the
client side returns a slightly lower heap size estimation than the SAME Cell on
the server (I don't believe it's related to the NextState size bubbling up
since NextState is only in branch-1+ and the issue is branch-1.0+). Maybe the
Cells/Results are serialized in such a way that these calculations are slightly
different? Somehow the server's size calculation is larger than the client's
size calculation.
However, even when we do understand why the server's size calculation is
different from the client's it may not help (of course we can only know once
the issue has been identified). Like you said, the underlying problem is that
the client shouldn't even be performing a size calculation but rather being
told by the server why the Results were returned. As long as there is a
possibility for the server and client to disagree on why the Results were
returned, it is possible to incorrectly jump between regions. Fixing the size
calculation may be sufficient for resolving this issue, but going forward I
think your idea of passing information back to the client in the ScanResult
will be the best way to go.
bq. Ultimately, the underlying problem is likely best addressed from the stance
that a scanner shouldn't be performing special logic based on the size of the
batch of data returned from a server
Agreed
bq. The server already maintains a nice enum of the reason which it returns a
batch of results to a client via NextState$State
Just a note: NextState was introduced with HBASE-11544 which has only been
backported to branch-1+ at this point. Since this issue appears in branch-1.0+,
returning the NextState$State enum would require backporting that feature
further.
bq. I'm currently of the opinion that it's ideal to pass this information back
to the client via the ScanResult
I agree that somehow we need to communicate the reasoning behind why these
Results were returned to the client rather than looking at the Result[] and
making an "educated" guess
bq. 0.98 clients running against 1.x could see this problem, although I have
not tested that to confirm it happens.
I suspect you're correct
> ResultScanner doesn't return all rows in Scan
> ---------------------------------------------
>
> Key: HBASE-13262
> URL: https://issues.apache.org/jira/browse/HBASE-13262
> Project: HBase
> Issue Type: Bug
> Components: Client
> Affects Versions: 2.0.0, 1.1.0
> Environment: Single node, pseduo-distributed 1.1.0-SNAPSHOT
> Reporter: Josh Elser
> Assignee: Josh Elser
> Priority: Blocker
> Fix For: 2.0.0, 1.1.0
>
> Attachments: testrun_0.98.txt, testrun_branch1.0.txt
>
>
> Tried to write a simple Java client again 1.1.0-SNAPSHOT.
> * Write 1M rows, each row with 1 family, and 10 qualifiers (values [0-9]),
> for a total of 10M cells written
> * Read back the data from the table, ensure I saw 10M cells
> Running it against {{04ac1891}} (and earlier) yesterday, I would get ~20% of
> the actual rows. Running against 1.0.0, returns all 10M records as expected.
> [Code I was
> running|https://github.com/joshelser/hbase-hwhat/blob/master/src/main/java/hbase/HBaseTest.java]
> for the curious.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)