[ 
https://issues.apache.org/jira/browse/HBASE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366412#comment-14366412
 ] 

Jonathan Lawlor commented on HBASE-13262:
-----------------------------------------

bq. Are you implying that this is specifically the problem? I'm not seeing 
where these sizes are used for anything more than metrics tracking

So within {{RSRpcServices#scan(...)}} we keep a running tally of the size of 
the accumulated {{Result}} within the variable {{currentScanResultSize}}. We 
collect the {{Result}} in a while loop that loops while the caching limit 
hasn't been reached. At the beginning of each iteration of this loop, we check 
the running Result size limit against the {{maxResultSize}}. If the size limit 
has been reached, we break out of the loop and will end up returning whatever 
Results we have accumulated thus far back to the client. The problem is that we 
then expect the Client to realize that the Results they receive are larger than 
the {{maxResultSize}} -- if the client's size calculation is less than the 
server's then it's possible the client will misinterpret the response as 
meaning the region has been exhausted.

bq. To me, the larger issue seems to be that only a Result[] is returned from 
ScannerCallable

I agree completely. It is ugly that we return ONLY a {{Result[]}} to the client 
and then expect them to understand why those are the Results that were returned 
from the server. Was the size limit reached? Was the caching limit reached? Was 
a partial result formed? Are there more results on the server or is the region 
exhausted? There are too many things that the client needs to infer from the 
{{Result[]}} alone that the server already had the answer to. I think it would 
be great if this could be cleaned up.

> ResultScanner doesn't return all rows in Scan
> ---------------------------------------------
>
>                 Key: HBASE-13262
>                 URL: https://issues.apache.org/jira/browse/HBASE-13262
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.0.0, 1.1.0
>         Environment: Single node, pseduo-distributed 1.1.0-SNAPSHOT
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Blocker
>             Fix For: 2.0.0, 1.1.0
>
>         Attachments: testrun_0.98.txt, testrun_branch1.0.txt
>
>
> Tried to write a simple Java client again 1.1.0-SNAPSHOT.
> * Write 1M rows, each row with 1 family, and 10 qualifiers (values [0-9]), 
> for a total of 10M cells written
> * Read back the data from the table, ensure I saw 10M cells
> Running it against {{04ac1891}} (and earlier) yesterday, I would get ~20% of 
> the actual rows. Running against 1.0.0, returns all 10M records as expected.
> [Code I was 
> running|https://github.com/joshelser/hbase-hwhat/blob/master/src/main/java/hbase/HBaseTest.java]
>  for the curious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to