[
https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303786#comment-14303786
]
Jonathan Lawlor commented on HBASE-11544:
-----------------------------------------
I have started to look into this issue this past week. I have begun by
investigating how [~lhofhansl]'s solution #1 could be implemented (solution #2
would be the natural next step afterwards). As discussed above, the currently
implementations of setBatch and setMaxResultSize seem to reveal how we could
develop a solution for #1:
Currently, if a user uses the setBatch method on their scan, they will receive
partial rows (assuming the batch size is less than the number of columns in the
row) on each call to next(). As [~lhofhansl] has called out above, this does
not break edit atomicity because the scanner maintains the readpoint state on
the server. This is an important workflow that we could mimic in the
implementation of solution #1: In the event that the entire row does not fit
into a chunk, we would be returning partial rows in a manner similar to how
batching returns partial rows.
The implementation of setMaxResultSize is a good starting point for the logic
behind rpcChunkSize but it is currently at too high of a level. The current
implementation evaluates the limit on the result size after each row's worth of
cells is retrieved. Specifically, in the event that the limit has been set, the
server will run through a loop and on each iteration it will retrieve all the
cells for one row. The loop will continue until the requested number of rows
has been retrieved OR the limit on the result size has been reached.
The reason why we would need to modify this in the case of rpcChunkSize is
because we want the limit to be at the cell level rather than at the row level.
If the row has many large cells, the result size limit won't matter because it
will OOME when retrieving the cells for single row.
In the case that we return a partial row due to the limits of the chunk size,
we would want to indicate that the result is indeed a partial with some flag in
the returned results. The flag would be necessary so that the client could
recognize whether or not it would need to make another RPC request to finish
the API call before delivering the results to the caller.
A couple issues that come to mind with the move to this new rpcChunkSize
architecture are highlighted below:
- Currently, filters are not always compatible with partial rows (as in the
case of setBatch) because sometimes all of the cells within a row are needed to
make a decision as to whether or not the row will be filtered. With the
introduction of rpcChunkSize, the logic behind evaluating filters may need to
be revised. Does anyone have any comments with respect to how this could be
handled?
- The solution #1 would not be able to prevent OOME that result from a single
cell being too large (in the same way that the current implementation of
setMaxResultSize cannot prevent OOME that result from a single row being too
large). The issue of Cells that are too large would need to be addressed with
the move to the full streaming protocol of solution #2.
In summary, the approach that I am thinking of taking for solution #1 is:
- Remove setMaxResultSize and replace it with a limit that we will call
rpcChunkSize
- Move the logic for rpcChunkSize down into the Cell level so that we can
prevent OOME that result from trying to fetch an entire row's worth of cells
- Add a flag to Results that allows the client to determine if the Result is a
partial (and they need to make more RPC requests to finish off the API call)
- Add logic on the client side to recognize when they need to make more RPC
requests to finish the API call
- Add a method to combine partial results into a single result before
delivering to caller.
- Still brainstorming how to handle the application of filters server side (any
advice here would be much appreciated).
Any feedback on my thought process, the issues I raised, and proposed approach
would be greatly appreciated!
Thanks
> [Ergonomics] hbase.client.scanner.caching is dogged and will try to return
> batch even if it means OOME
> ------------------------------------------------------------------------------------------------------
>
> Key: HBASE-11544
> URL: https://issues.apache.org/jira/browse/HBASE-11544
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Priority: Critical
> Labels: beginner
>
> Running some tests, I set hbase.client.scanner.caching=1000. Dataset has
> large cells. I kept OOME'ing.
> Serverside, we should measure how much we've accumulated and return to the
> client whatever we've gathered once we pass out a certain size threshold
> rather than keep accumulating till we OOME.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)