[ 
https://issues.apache.org/jira/browse/NIFI-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743516#comment-14743516
 ] 

Alan Jackoway commented on NIFI-817:
------------------------------------

I think it would be good to do a quick survey of what other NoSQL databases we 
will want to support (I would guess Cassandra, Mongo, etc.), and see if at 
least the base elements here will work for each of them. Specifically, does the 
row serialization here make sense in those databases? Do each of them have a 
concept of getting rows in a time range? That way at least the schema coming 
out of each NoSQL processor and the get behavior will be similar. If you use 
Avro instead of JSON as the intermediate format, you could even enforce some 
kind of record format for each of these.

I suspect that coming up with a common read schema won't be too hard, but doing 
something similar for writing data will be more difficult.

If the goal is to support incremental processing within a single NoSQL system, 
the Google Percolator paper is probably relevant: 
http://research.google.com/pubs/pub36726.html but would take quite a bit more 
engineering and is likely going to be hard to standardize across different 
NoSQL systems.

I might rename this processor form GetHBase to ScanIncrementalHBase. I suspect 
there may be use cases for scheduled full-table scans or explicit by-row get 
operations later on.

> Create Processors to interact with HBase
> ----------------------------------------
>
>                 Key: NIFI-817
>                 URL: https://issues.apache.org/jira/browse/NIFI-817
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>             Fix For: 0.4.0
>
>         Attachments: 
> 0001-NIFI-817-Initial-implementation-of-HBase-processors.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to