[
https://issues.apache.org/jira/browse/ACCUMULO-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Christopher Tubbs resolved ACCUMULO-3710.
-----------------------------------------
Resolution: Abandoned
Closing this stale issue. If this is still a problem, please create a new issue
or PR at https://github.com/apache/accumulo
> Scanning with many singleton ranges crashes tserver
> ---------------------------------------------------
>
> Key: ACCUMULO-3710
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3710
> Project: Accumulo
> Issue Type: Bug
> Components: client, tserver
> Affects Versions: 1.6.1
> Reporter: Shana Hutchison
> Priority: Major
>
> Setup: single-node standalone 1.6.1 Accumulo instance.
> Use case: scan ~1M individual rows, scattered across a ~15GB table.
> The following steps crash the TabletServer:
> 1. Gather a List of Range objects, each one a singleton range spanning an
> entire row.
> 2. Create a BatchScanner with one read thread.
> 3. Set the ranges via BatchScanner.setRanges()
> 4. Start iterating through the scanner.
> One solution is to batch the reads into groups of ~10k ranges idea.
> Comment from Josh Elser:
> {quote}
> Taking a quick glance at the code, it looks like this would be a good place
> to do some optimization in the BatchScanner's impl
> (TabletServerBatchReaderImpl). The BatchScanner will bin the ranges to the
> tablets and the servers hosting those tablets. Normally, this would be spread
> out, but, in your single server case, all 1M rows would all go to a single
> TabletServer in one RPC call.
> I'm guessing a good optimization here would be to check the size of a batch
> of Ranges for a single tabletserver, and when above a certain threshold,
> split the batch in half and try to reprocess each half (the recursion would
> naturally keep splitting until we get down to some high-watermark).
> Point being, if your client VM constructed the Ranges without issue, the
> BatchScanner impl should be smart enough to not knock over a TabletServer.
> {quote}
> Verified to cause an OOME via tserver_localhost.out:
> {quote}
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill -9 %p"
> # Executing /bin/sh -c "kill -9 12833"...
> {quote}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)