[
https://issues.apache.org/jira/browse/NIFI-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404079#comment-15404079
]
ASF GitHub Bot commented on NIFI-1868:
--------------------------------------
Github user mattyb149 commented on the issue:
https://github.com/apache/nifi/pull/706
Per an offline discussion with @bbende, we decided a good way to handle the
outgoing flow files is to have a "success", "failure", and "retry"
relationship. If a connection / environment error occurs, the incoming flow
file will be routed to retry. Otherwise, all records put successfully to Hive
Streaming will go into an Avro record on the "success" relationship, and any
records failed to be written will go into an Avro record on the "failure"
relationship.
> Add support for Hive Streaming
> ------------------------------
>
> Key: NIFI-1868
> URL: https://issues.apache.org/jira/browse/NIFI-1868
> Project: Apache NiFi
> Issue Type: New Feature
> Reporter: Matt Burgess
> Assignee: Matt Burgess
> Fix For: 1.0.0
>
>
> Traditionally adding new data into Hive requires gathering a large amount of
> data onto HDFS and then periodically adding a new partition. This is
> essentially a “batch insertion”. Insertion of new data into an existing
> partition is not permitted. Hive Streaming API allows data to be pumped
> continuously into Hive. The incoming data can be continuously committed in
> small batches of records into an existing Hive partition or table. Once data
> is committed it becomes immediately visible to all Hive queries initiated
> subsequently.
> This case is to add a PutHiveStreaming processor to NiFi, to leverage the
> Hive Streaming API to allow continuous streaming of data into a Hive
> partition/table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)