Keith Turner created ACCUMULO-4562:
--------------------------------------

             Summary: Consider Adding Java 8 Stream support to scanners
                 Key: ACCUMULO-4562
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4562
             Project: Accumulo
          Issue Type: Improvement
            Reporter: Keith Turner
             Fix For: 2.0.0


For a test I wanted to find the min and max timestamp of an Accumulo table.  I 
used Java 8 streams to do that as follows.  The code 
{{StreamSupport.stream(scanner.spliterator(), false)}} is a standard way in 
Java 8 to create a stream from an Iterable.

{code:java}
    try(Scanner scanner = c.createScanner(table, Authorizations.EMPTY)){
      Stream<Entry<Key,Value>> stream = 
            StreamSupport.stream(scanner.spliterator(), false);
      LongSummaryStatistics stats = stream
         .mapToLong(e -> e.getKey().getTimestamp())
         .summaryStatistics();
      System.out.println(stats);
    }
{code}

In Java 8, collections have the {{stream()}} and {{parallelStream()}} methods.  
If ScannerBase had those methods in Accumulo, then the following could be 
written w/o using {{StreamSupport}}

{code:java}
    try(Scanner scanner = c.createScanner(table, Authorizations.EMPTY)){
      LongSummaryStatistics stats = scanner.stream()
         .mapToLong(e -> e.getKey().getTimestamp())
         .summaryStatistics();
      System.out.println(stats);
    }
{code}

For the BatchScanner I think we could implement a parallel stream.  One way to 
do this would be a to create an internal batch scanner queue for each Java 8 
split iterator.  Currently the BatchScanner has one queue that all background 
threads put batches of key values on.  With multiple queues, each background 
thread could break its batches into equal sizes and put a subset on each queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to