[ 
https://issues.apache.org/jira/browse/ACCUMULO-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544153#comment-14544153
 ] 

Josh Elser commented on ACCUMULO-3710:
--------------------------------------

bq. It'd be good to have some mitigation for this, in order to avoid the 
potentially large number of ranges being issued, like in ACCUMULO-3602.

Agreed. Doing some more chunking in the client to prevent spamming a tserver w/ 
an exorbitant number of ranges would be good to do. My hunch is that optimizing 
small, disjoint collections of ranges more efficiently is different work.

> Scanning with many singleton ranges crashes tserver
> ---------------------------------------------------
>
>                 Key: ACCUMULO-3710
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3710
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client, tserver
>    Affects Versions: 1.6.1
>            Reporter: Dylan Hutchison
>             Fix For: 1.8.0
>
>
> Setup: single-node standalone 1.6.1 Accumulo instance.
> Use case: scan ~1M individual rows, scattered across a ~15GB table.  
> The following steps crash the TabletServer:
> 1. Gather a List of Range objects, each one a singleton range spanning an 
> entire row.
> 2. Create a BatchScanner with one read thread.
> 3. Set the ranges via BatchScanner.setRanges()
> 4. Start iterating through the scanner.
> One solution is to batch the reads into groups of ~10k ranges idea.  
> Comment from Josh Elser:
> {quote}
> Taking a quick glance at the code, it looks like this would be a good place 
> to do some optimization in the BatchScanner's impl 
> (TabletServerBatchReaderImpl). The BatchScanner will bin the ranges to the 
> tablets and the servers hosting those tablets. Normally, this would be spread 
> out, but, in your single server case, all 1M rows would all go to a single 
> TabletServer in one RPC call.
> I'm guessing a good optimization here would be to check the size of a batch 
> of Ranges for a single tabletserver, and when above a certain threshold, 
> split the batch in half and try to reprocess each half (the recursion would 
> naturally keep splitting until we get down to some high-watermark).
> Point being, if your client VM constructed the Ranges without issue, the 
> BatchScanner impl should be smart enough to not knock over a TabletServer.
> {quote}
> Verified to cause an OOME via  tserver_localhost.out:
> {quote}
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 12833"...
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to