Github user ryaneleary commented on the pull request:

    https://github.com/apache/accumulo/pull/7#issuecomment-43555087
  
    Sure. Here's an example of what I've done (by modifying the 
WholeRowIterator). Assume a table that contains 'documents' and various 
document information. Assume a table format like this (format: ```<row <cf : 
cq>> = value```):
    ```
    <docid1 <META : timestamp>> (long) 12345
    <docid1 <META : meta_field_1>> (String) metaFieldValue1
    <docid1 <META : meta_field_2>> (String) metaFieldValue2
    <docid2 <META : timestamp>>  (long) 23456
    ```
    
    For other reasons in the system, I have a protocol buffer definition that 
can be used for shipping this data around. It's definition looks like this:
    ```
    message DocumentMessage {
        optional int64 timestamp = 1;
        optional string meta_field_1 = 2;
        optional string meta_field_2 = 3;
    }
    ```
    
    I can define an iterator that is passed a generated protocol buffer class 
that can iterate over the whole row and automatically build the protocol buffer 
object. getTopValue() now returns a protocol buffer message that the 
application knows how to parse.
    
    This is nice for bulk loading and saves the client from having to build the 
message it would have had to build anyway. When iterating over the results 
returned by the scanner, an entire document at a time is returned instead of 
individual columns.
    
    I can provide a more concrete example/implementation of such a class if 
you'd like.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to