Keith Turner created ACCUMULO-4562:
--------------------------------------
Summary: Consider Adding Java 8 Stream support to scanners
Key: ACCUMULO-4562
URL: https://issues.apache.org/jira/browse/ACCUMULO-4562
Project: Accumulo
Issue Type: Improvement
Reporter: Keith Turner
Fix For: 2.0.0
For a test I wanted to find the min and max timestamp of an Accumulo table. I
used Java 8 streams to do that as follows. The code
{{StreamSupport.stream(scanner.spliterator(), false)}} is a standard way in
Java 8 to create a stream from an Iterable.
{code:java}
try(Scanner scanner = c.createScanner(table, Authorizations.EMPTY)){
Stream<Entry<Key,Value>> stream =
StreamSupport.stream(scanner.spliterator(), false);
LongSummaryStatistics stats = stream
.mapToLong(e -> e.getKey().getTimestamp())
.summaryStatistics();
System.out.println(stats);
}
{code}
In Java 8, collections have the {{stream()}} and {{parallelStream()}} methods.
If ScannerBase had those methods in Accumulo, then the following could be
written w/o using {{StreamSupport}}
{code:java}
try(Scanner scanner = c.createScanner(table, Authorizations.EMPTY)){
LongSummaryStatistics stats = scanner.stream()
.mapToLong(e -> e.getKey().getTimestamp())
.summaryStatistics();
System.out.println(stats);
}
{code}
For the BatchScanner I think we could implement a parallel stream. One way to
do this would be a to create an internal batch scanner queue for each Java 8
split iterator. Currently the BatchScanner has one queue that all background
threads put batches of key values on. With multiple queues, each background
thread could break its batches into equal sizes and put a subset on each queue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)