[ 
https://issues.apache.org/jira/browse/ACCUMULO-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812391#comment-15812391
 ] 

Keith Turner commented on ACCUMULO-4562:
----------------------------------------

I was thinking about the batch scanner spiterator.  It may be more efficient to 
leave batch scanner's queue as is (a single queue of batches of key/vals) and 
have each spliterator pull batches off.  Since the queue has batches on it 
(instead of single key/values) there should not be much concurrency contention. 
 This model result in a single queue with multiple threads servicing, which is 
more resilient than multiple queues when a servicing thread gets bogged down.

> Consider Adding Java 8 Stream support to scanners
> -------------------------------------------------
>
>                 Key: ACCUMULO-4562
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4562
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Keith Turner
>             Fix For: 2.0.0
>
>
> For a test I wanted to find the min and max timestamp of an Accumulo table.  
> I used Java 8 streams to do that as follows.  The code 
> {{StreamSupport.stream(scanner.spliterator(), false)}} is a standard way in 
> Java 8 to create a stream from an Iterable.
> {code:java}
>     try(Scanner scanner = c.createScanner(table, Authorizations.EMPTY)){
>       Stream<Entry<Key,Value>> stream = 
>             StreamSupport.stream(scanner.spliterator(), false);
>       LongSummaryStatistics stats = stream
>          .mapToLong(e -> e.getKey().getTimestamp())
>          .summaryStatistics();
>       System.out.println(stats);
>     }
> {code}
> In Java 8, collections have the {{stream()}} and {{parallelStream()}} 
> methods.  If ScannerBase had those methods in Accumulo, then the following 
> could be written w/o using {{StreamSupport}}
> {code:java}
>     try(Scanner scanner = c.createScanner(table, Authorizations.EMPTY)){
>       LongSummaryStatistics stats = scanner.stream()
>          .mapToLong(e -> e.getKey().getTimestamp())
>          .summaryStatistics();
>       System.out.println(stats);
>     }
> {code}
> For the BatchScanner I think we could implement a parallel stream.  One way 
> to do this would be a to create an internal batch scanner queue for each Java 
> 8 split iterator.  Currently the BatchScanner has one queue that all 
> background threads put batches of key values on.  With multiple queues, each 
> background thread could break its batches into equal sizes and put a subset 
> on each queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to