[ 
https://issues.apache.org/jira/browse/ACCUMULO-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419922#comment-13419922
 ] 

Adam Fuchs commented on ACCUMULO-697:
-------------------------------------

I like the concept, but does this go far enough? If Values aren't special, then 
are Keys special, and if so then why? Should we make our SortedKeyValueIterator 
implement Iterable<? extends Object> ? Then the bottom level iterator (RFile 
reader) would include KeyValue or Entry<Key,Value> objects, the top level 
iterator for scans would have to have objects that are serializable, and the 
top level iterator for compactions would have to implement 
Iterable<Entry<Key,Value>>.

One of the problems we have with iterators now is that the Key and Value are 
accessed with separate methods, even though they're always read off of disk 
together. Splitting up the Key and Value on the server side is sort of 
arbitrary and could reduce our ability to parallelize iterators (if we ever 
decide that's something we want to do).

Another problem is that SortedKeyValueIterator falls somewhere in between 
Java's Iterator and Iterable interfaces. SortedKeyValueIterator holds onto 
filters, aggregation parameters, etc. that make it act like a collection, and 
it keeps a pointer to somewhere in that collection like an Iterator. I think we 
should change SortedKeyValueIterator into more like an immutable collection, or 
a consistent, isolated, unchanging view of the data, and have it implement 
Iterable. That might open up opportunities for automating optimization of 
queries on the server side, or better support for built-in iterator tree 
definition languages.
                
> Break Scanner parameterization from Key,Value to Key,{Something}
> ----------------------------------------------------------------
>
>                 Key: ACCUMULO-697
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-697
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.5.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>
> When writing a custom iterator, many times the iterator has some semantic 
> knowledge of what each Key/Value being returned actually means (e.g. A word 
> count could be returning Key/Value but really is returning an Integer/Long 
> count in the Value). This forces the client to know what is going to be 
> returned and handle the cast/transformation.
> I believe it should be fairly straightforward to encapsulate this 
> transformation inside the Accumulo client code. I plan on investigating the 
> possibility of changing the ScannerBase impl, or perhaps making a 
> TypedScannerBase, in which the iterator at the "top" of the stack for a scan 
> can return something other than a Value to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to