[
https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Lawlor updated HBASE-11544:
------------------------------------
Attachment: HBASE-11544-v1.patch
Hey folks,
I've been working on this issue and I am attaching a patch of what I have so
far. Below I have included some discussion points that would be great to get
some feedback on:
A few issues were encountered while implementing a solution for this problem.
The issues, as well as their current solutions, are outlined below (any
feedback on alternative ways to solve these problems would be appreciated):
* In some cases, the concept of partial results doesn't seem
appropriate. In these cases, I ensured that partial results would not be
created as it would only hurt performance or cause confusion. The cases where I
felt partial results should be avoided were:
** When the client has defined a filter for their scan that
requires the entire row to be read.
** When the client has specified that the scan is a Small scan.
Small scans are designed to execute in a single RPC request and so the idea of
having to make multiple RPC requests to form the complete Result seems
inappropriate
* When I changed the default value of caching to Integer.MAX_VALUE I
was running into OOME on the server since caching is used to presize the
ArrayList that holds results. A simple solution to this is to simply not set an
initial size on the array list. However, this solution may still run into
memory issues if the ArrayList must expand the underlying array many times
(e.g. if the table being scanned has many small rows leading to a large amount
of Results in the array list). I was wondering what everyone thought of the
simple solution. If a more sophisticated solution is required it may be best to
move the caching change into its own JIRA.
* When combining the partial results into a single complete result on
the client side, an exception will be thrown from within ResultScanner#next()
if it is found that the partial results belong to different rows. This is a
corner case issue that should never show up since sequence numbers are already
used in each RPC request to ensure proper ordering of request/responses but I
figured it is worth mentioning
The fine grained details of implementation can be seen in the patch, but I
thought it would be worth highlighting how this new partial result workflow
can be used to avoid OOME on the server:
* The setting of Scan#setMaxResultSize will now operate at the cell
level rather than the row level. This allows a client to retrieve very large
rows in fragments/partials that would previously cause the server to OOME. By
default, the entire complete result will only be formed on the client side,
whereas the server will only ever see partial Results for very large rows.
* A new option (Scan#setAllowPartials) has been added to Scans to allow
the client to see the partial results returned by the server. This setting will
be useful in cases where the client would OOME if they were forced to
reconstruct the complete result.
* If clients want to utilize this partial result workflow, they should
use non-filtered, non-small scans (see issues above for reasoning).
Areas for future improvement:
* As [~lhofhansl] has pointed out, RPC is inefficient and could be
improved by prefetching results server side. This issue has been raised in
HBASE-12994
* As called out in the issues above, the initial sizing of the
ArrayList on the server side seems like it could be improved to avoid resizing
of the underlying array
* Streaming is the most ideal workflow for RPC requests but will
require a large rework
Any feedback on the patch would be greatly appreciated. I am expecting the QA
run to come back with some test failures which I will address in a subsequent
patch. I'm pinging [~lhofhansl] and [~stack] as we were discussing this
solution above, but if anyone else has any feedback it would be appreciated as
well!
Thanks
> [Ergonomics] hbase.client.scanner.caching is dogged and will try to return
> batch even if it means OOME
> ------------------------------------------------------------------------------------------------------
>
> Key: HBASE-11544
> URL: https://issues.apache.org/jira/browse/HBASE-11544
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Jonathan Lawlor
> Priority: Critical
> Labels: beginner
> Attachments: HBASE-11544-v1.patch
>
>
> Running some tests, I set hbase.client.scanner.caching=1000. Dataset has
> large cells. I kept OOME'ing.
> Serverside, we should measure how much we've accumulated and return to the
> client whatever we've gathered once we pass out a certain size threshold
> rather than keep accumulating till we OOME.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)