[
https://issues.apache.org/jira/browse/NIFI-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743516#comment-14743516
]
Alan Jackoway commented on NIFI-817:
------------------------------------
I think it would be good to do a quick survey of what other NoSQL databases we
will want to support (I would guess Cassandra, Mongo, etc.), and see if at
least the base elements here will work for each of them. Specifically, does the
row serialization here make sense in those databases? Do each of them have a
concept of getting rows in a time range? That way at least the schema coming
out of each NoSQL processor and the get behavior will be similar. If you use
Avro instead of JSON as the intermediate format, you could even enforce some
kind of record format for each of these.
I suspect that coming up with a common read schema won't be too hard, but doing
something similar for writing data will be more difficult.
If the goal is to support incremental processing within a single NoSQL system,
the Google Percolator paper is probably relevant:
http://research.google.com/pubs/pub36726.html but would take quite a bit more
engineering and is likely going to be hard to standardize across different
NoSQL systems.
I might rename this processor form GetHBase to ScanIncrementalHBase. I suspect
there may be use cases for scheduled full-table scans or explicit by-row get
operations later on.
> Create Processors to interact with HBase
> ----------------------------------------
>
> Key: NIFI-817
> URL: https://issues.apache.org/jira/browse/NIFI-817
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Reporter: Mark Payne
> Assignee: Mark Payne
> Fix For: 0.4.0
>
> Attachments:
> 0001-NIFI-817-Initial-implementation-of-HBase-processors.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)