[
https://issues.apache.org/jira/browse/ACCUMULO-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002352#comment-14002352
]
ASF GitHub Bot commented on ACCUMULO-2825:
------------------------------------------
Github user ryaneleary commented on the pull request:
https://github.com/apache/accumulo/pull/7#issuecomment-43555087
Sure. Here's an example of what I've done (by modifying the
WholeRowIterator). Assume a table that contains 'documents' and various
document information. Assume a table format like this (format: ```<row <cf :
cq>> = value```):
```
<docid1 <META : timestamp>> (long) 12345
<docid1 <META : meta_field_1>> (String) metaFieldValue1
<docid1 <META : meta_field_2>> (String) metaFieldValue2
<docid2 <META : timestamp>> (long) 23456
```
For other reasons in the system, I have a protocol buffer definition that
can be used for shipping this data around. It's definition looks like this:
```
message DocumentMessage {
optional int64 timestamp = 1;
optional string meta_field_1 = 2;
optional string meta_field_2 = 3;
}
```
I can define an iterator that is passed a generated protocol buffer class
that can iterate over the whole row and automatically build the protocol buffer
object. getTopValue() now returns a protocol buffer message that the
application knows how to parse.
This is nice for bulk loading and saves the client from having to build the
message it would have had to build anyway. When iterating over the results
returned by the scanner, an entire document at a time is returned instead of
individual columns.
I can provide a more concrete example/implementation of such a class if
you'd like.
> WholeRowIterator should be extendable
> -------------------------------------
>
> Key: ACCUMULO-2825
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2825
> Project: Accumulo
> Issue Type: Improvement
> Components: client
> Affects Versions: 1.5.1, 1.6.0
> Reporter: Ryan Leary
> Assignee: Ryan Leary
> Priority: Minor
> Fix For: 1.6.1, 1.7.0
>
> Attachments: add_encode_row_iterator.patch
>
>
> It would be useful to be able to choose encodings other than what is
> implemented already in WholeRowIterator's encodeRow and decodeRow public
> static final methods.
> As an example, I wrote an iterator that reads in CQ/val pairs and
> automatically populates a protocol buffer. To do this, however, I essentially
> copy/pasted all of the WholeRowIterator source and changed the encode/decode
> methods.
> In the interest of not changing the WholeRowIterator API in any meaningful
> way (hopefully meaning this improvement could be added to 1.6.1), I have
> created a new abstract iterator: RowEncodingIterator, which WholeRowIterator
> now extends, implementing rowEncoder and rowDecoder.
--
This message was sent by Atlassian JIRA
(v6.2#6252)