[
https://issues.apache.org/jira/browse/HBASE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366412#comment-14366412
]
Jonathan Lawlor commented on HBASE-13262:
-----------------------------------------
bq. Are you implying that this is specifically the problem? I'm not seeing
where these sizes are used for anything more than metrics tracking
So within {{RSRpcServices#scan(...)}} we keep a running tally of the size of
the accumulated {{Result}} within the variable {{currentScanResultSize}}. We
collect the {{Result}} in a while loop that loops while the caching limit
hasn't been reached. At the beginning of each iteration of this loop, we check
the running Result size limit against the {{maxResultSize}}. If the size limit
has been reached, we break out of the loop and will end up returning whatever
Results we have accumulated thus far back to the client. The problem is that we
then expect the Client to realize that the Results they receive are larger than
the {{maxResultSize}} -- if the client's size calculation is less than the
server's then it's possible the client will misinterpret the response as
meaning the region has been exhausted.
bq. To me, the larger issue seems to be that only a Result[] is returned from
ScannerCallable
I agree completely. It is ugly that we return ONLY a {{Result[]}} to the client
and then expect them to understand why those are the Results that were returned
from the server. Was the size limit reached? Was the caching limit reached? Was
a partial result formed? Are there more results on the server or is the region
exhausted? There are too many things that the client needs to infer from the
{{Result[]}} alone that the server already had the answer to. I think it would
be great if this could be cleaned up.
> ResultScanner doesn't return all rows in Scan
> ---------------------------------------------
>
> Key: HBASE-13262
> URL: https://issues.apache.org/jira/browse/HBASE-13262
> Project: HBase
> Issue Type: Bug
> Components: Client
> Affects Versions: 2.0.0, 1.1.0
> Environment: Single node, pseduo-distributed 1.1.0-SNAPSHOT
> Reporter: Josh Elser
> Assignee: Josh Elser
> Priority: Blocker
> Fix For: 2.0.0, 1.1.0
>
> Attachments: testrun_0.98.txt, testrun_branch1.0.txt
>
>
> Tried to write a simple Java client again 1.1.0-SNAPSHOT.
> * Write 1M rows, each row with 1 family, and 10 qualifiers (values [0-9]),
> for a total of 10M cells written
> * Read back the data from the table, ensure I saw 10M cells
> Running it against {{04ac1891}} (and earlier) yesterday, I would get ~20% of
> the actual rows. Running against 1.0.0, returns all 10M records as expected.
> [Code I was
> running|https://github.com/joshelser/hbase-hwhat/blob/master/src/main/java/hbase/HBaseTest.java]
> for the curious.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)