[jira] [Commented] (NIFI-1868) Add support for Hive Streaming

ASF GitHub Bot (JIRA) Tue, 02 Aug 2016 11:12:34 -0700

    [ 
https://issues.apache.org/jira/browse/NIFI-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404509#comment-15404509
 ]


ASF GitHub Bot commented on NIFI-1868:
--------------------------------------

Github user bbende commented on the issue:

    https://github.com/apache/nifi/pull/706
  
    Latest update is looking good... one thing I noticed, if you send in an 
Avro file that does not have the partition columns of the table, it throws an 
IOException around line 435 when trying to extract the partition fields from 
the Avro schema, but then it gets wrapped in a ProcessException and thrown out 
of onTrigger so the flow file sits in the incoming queue but can never be 
processed. 
    
    Could we look for ProcessException with a cause of IOException and route to 
failure (similar to the connection error handling)? or maybe create a specific 
exception type to look for since there could be other IOExceptions that we want 
to bounce out of onTrigger?


> Add support for Hive Streaming
> ------------------------------
>
>                 Key: NIFI-1868
>                 URL: https://issues.apache.org/jira/browse/NIFI-1868
>             Project: Apache NiFi
>          Issue Type: New Feature
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>             Fix For: 1.0.0
>
>
> Traditionally adding new data into Hive requires gathering a large amount of 
> data onto HDFS and then periodically adding a new partition. This is 
> essentially a “batch insertion”. Insertion of new data into an existing 
> partition is not permitted. Hive Streaming API allows data to be pumped 
> continuously into Hive. The incoming data can be continuously committed in 
> small batches of records into an existing Hive partition or table. Once data 
> is committed it becomes immediately visible to all Hive queries initiated 
> subsequently.
> This case is to add a PutHiveStreaming processor to NiFi, to leverage the 
> Hive Streaming API to allow continuous streaming of data into a Hive 
> partition/table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NIFI-1868) Add support for Hive Streaming

Reply via email to