[
https://issues.apache.org/jira/browse/ACCUMULO-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544153#comment-14544153
]
Josh Elser commented on ACCUMULO-3710:
--------------------------------------
bq. It'd be good to have some mitigation for this, in order to avoid the
potentially large number of ranges being issued, like in ACCUMULO-3602.
Agreed. Doing some more chunking in the client to prevent spamming a tserver w/
an exorbitant number of ranges would be good to do. My hunch is that optimizing
small, disjoint collections of ranges more efficiently is different work.
> Scanning with many singleton ranges crashes tserver
> ---------------------------------------------------
>
> Key: ACCUMULO-3710
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3710
> Project: Accumulo
> Issue Type: Bug
> Components: client, tserver
> Affects Versions: 1.6.1
> Reporter: Dylan Hutchison
> Fix For: 1.8.0
>
>
> Setup: single-node standalone 1.6.1 Accumulo instance.
> Use case: scan ~1M individual rows, scattered across a ~15GB table.
> The following steps crash the TabletServer:
> 1. Gather a List of Range objects, each one a singleton range spanning an
> entire row.
> 2. Create a BatchScanner with one read thread.
> 3. Set the ranges via BatchScanner.setRanges()
> 4. Start iterating through the scanner.
> One solution is to batch the reads into groups of ~10k ranges idea.
> Comment from Josh Elser:
> {quote}
> Taking a quick glance at the code, it looks like this would be a good place
> to do some optimization in the BatchScanner's impl
> (TabletServerBatchReaderImpl). The BatchScanner will bin the ranges to the
> tablets and the servers hosting those tablets. Normally, this would be spread
> out, but, in your single server case, all 1M rows would all go to a single
> TabletServer in one RPC call.
> I'm guessing a good optimization here would be to check the size of a batch
> of Ranges for a single tabletserver, and when above a certain threshold,
> split the batch in half and try to reprocess each half (the recursion would
> naturally keep splitting until we get down to some high-watermark).
> Point being, if your client VM constructed the Ranges without issue, the
> BatchScanner impl should be smart enough to not knock over a TabletServer.
> {quote}
> Verified to cause an OOME via tserver_localhost.out:
> {quote}
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill -9 %p"
> # Executing /bin/sh -c "kill -9 12833"...
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)