[ 
https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303786#comment-14303786
 ] 

Jonathan Lawlor commented on HBASE-11544:
-----------------------------------------

I have started to look into this issue this past week. I have begun by 
investigating how [~lhofhansl]'s solution #1 could be implemented (solution #2 
would be the natural next step afterwards). As discussed above, the currently 
implementations of setBatch and setMaxResultSize seem to reveal how we could 
develop a solution for #1:

Currently, if a user uses the setBatch method on their scan, they will receive 
partial rows (assuming the batch size is less than the number of columns in the 
row) on each call to next(). As [~lhofhansl] has called out above, this does 
not break edit atomicity because the scanner maintains the readpoint state on 
the server. This is an important workflow that we could mimic in the 
implementation of solution #1: In the event that the entire row does not fit 
into a chunk, we would be returning partial rows in a manner similar to how 
batching returns partial rows. 

The implementation of setMaxResultSize is a good starting point for the logic 
behind rpcChunkSize but it is currently at too high of a level. The current 
implementation evaluates the limit on the result size after each row's worth of 
cells is retrieved. Specifically, in the event that the limit has been set, the 
server will run through a loop and on each iteration it will retrieve all the 
cells for one row. The loop will continue until the requested number of rows 
has been retrieved OR the limit on the result size has been reached. 

The reason why we would need to modify this in the case of rpcChunkSize is 
because we want the limit to be at the cell level rather than at the row level. 
If the row has many large cells, the result size limit won't matter because it 
will OOME when retrieving the cells for single row. 

In the case that we return a partial row due to the limits of the chunk size, 
we would want to indicate that the result is indeed a partial with some flag in 
the returned results. The flag would be necessary so that the client could 
recognize whether or not it would need to make another RPC request to finish 
the API call before delivering the results to the caller.

A couple issues that come to mind with the move to this new rpcChunkSize 
architecture are highlighted below:
- Currently, filters are not always compatible with partial rows (as in the 
case of setBatch) because sometimes all of the cells within a row are needed to 
make a decision as to whether or not the row will be filtered. With the 
introduction of rpcChunkSize, the logic behind evaluating filters may need to 
be revised. Does anyone have any comments with respect to how this could be 
handled?
- The solution #1 would not be able to prevent OOME that result from a single 
cell being too large (in the same way that the current implementation of 
setMaxResultSize cannot prevent OOME that result from a single row being too 
large). The issue of Cells that are too large would need to be addressed with 
the move to the full streaming protocol of solution #2.

In summary, the approach that I am thinking of taking for solution #1 is:
- Remove setMaxResultSize and replace it with a limit that we will call 
rpcChunkSize
- Move the logic for rpcChunkSize down into the Cell level so that we can 
prevent OOME that result from trying to fetch an entire row's worth of cells 
- Add a flag to Results that allows the client to determine if the Result is a 
partial (and they need to make more RPC requests to finish off the API call)
- Add logic on the client side to recognize when they need to make more RPC 
requests to finish the API call
- Add a method to combine partial results into a single result before 
delivering to caller.
- Still brainstorming how to handle the application of filters server side (any 
advice here would be much appreciated).

Any feedback on my thought process, the issues I raised, and proposed approach 
would be greatly appreciated!

Thanks


> [Ergonomics] hbase.client.scanner.caching is dogged and will try to return 
> batch even if it means OOME
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-11544
>                 URL: https://issues.apache.org/jira/browse/HBASE-11544
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Critical
>              Labels: beginner
>
> Running some tests, I set hbase.client.scanner.caching=1000.  Dataset has 
> large cells.  I kept OOME'ing.
> Serverside, we should measure how much we've accumulated and return to the 
> client whatever we've gathered once we pass out a certain size threshold 
> rather than keep accumulating till we OOME.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to