[
https://issues.apache.org/jira/browse/ACCUMULO-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812906#comment-15812906
]
Keith Turner commented on ACCUMULO-4562:
----------------------------------------
I was discussing this issue with [~ctubbsii] and the issue of spliterators
blocking came up. I looked into the implications of this and found having a
spliterator that blocks is not great for a parallel stream. See the following
links for background on why
https://www.tobyhobson.co.uk/java-8-parallel-streams-fork-join-pool/
https://bugs.openjdk.java.net/browse/JDK-8161685
> Consider Adding Java 8 Stream support to scanners
> -------------------------------------------------
>
> Key: ACCUMULO-4562
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4562
> Project: Accumulo
> Issue Type: Improvement
> Reporter: Keith Turner
> Fix For: 2.0.0
>
>
> For a test I wanted to find the min and max timestamp of an Accumulo table.
> I used Java 8 streams to do that as follows. The code
> {{StreamSupport.stream(scanner.spliterator(), false)}} is a standard way in
> Java 8 to create a stream from an Iterable.
> {code:java}
> try(Scanner scanner = c.createScanner(table, Authorizations.EMPTY)){
> Stream<Entry<Key,Value>> stream =
> StreamSupport.stream(scanner.spliterator(), false);
> LongSummaryStatistics stats = stream
> .mapToLong(e -> e.getKey().getTimestamp())
> .summaryStatistics();
> System.out.println(stats);
> }
> {code}
> In Java 8, collections have the {{stream()}} and {{parallelStream()}}
> methods. If ScannerBase had those methods in Accumulo, then the following
> could be written w/o using {{StreamSupport}}
> {code:java}
> try(Scanner scanner = c.createScanner(table, Authorizations.EMPTY)){
> LongSummaryStatistics stats = scanner.stream()
> .mapToLong(e -> e.getKey().getTimestamp())
> .summaryStatistics();
> System.out.println(stats);
> }
> {code}
> For the BatchScanner I think we could implement a parallel stream. One way
> to do this would be a to create an internal batch scanner queue for each Java
> 8 split iterator. Currently the BatchScanner has one queue that all
> background threads put batches of key values on. With multiple queues, each
> background thread could break its batches into equal sizes and put a subset
> on each queue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)