[jira] [Commented] (NIFI-1868) Add support for Hive Streaming

ASF GitHub Bot (JIRA) Tue, 02 Aug 2016 13:58:05 -0700

    [ 
https://issues.apache.org/jira/browse/NIFI-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404766#comment-15404766
 ]


ASF GitHub Bot commented on NIFI-1868:
--------------------------------------

Github user bbende commented on the issue:

    https://github.com/apache/nifi/pull/706
  
    Error handling on the relationships is looking good, thanks for making 
those updates.
    
    One more thing I ran into, when I set concurrent tasks to 2 on the 
PutHiveStreamingProcessor and I was generating a flow file every 500 ms, I 
would occasionally get the following error:
    
    ```
    2016-08-02 20:50:17,860 ERROR [Timer-Driven Process Thread-1] 
o.a.n.processors.hive.PutHiveStreaming
    java.lang.NullPointerException: null
        at 
org.apache.hive.hcatalog.streaming.StrictJsonWriter.write(StrictJsonWriter.java:79)
 ~[na:na]
        at 
org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.write(HiveEndPoint.java:632)
 ~[na:na]
        at org.apache.nifi.util.hive.HiveWriter$1.call(HiveWriter.java:113) 
~[na:na]
        at org.apache.nifi.util.hive.HiveWriter$1.call(HiveWriter.java:110) 
~[na:na]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[na:1.8.0_102]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_102]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_102]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102] 
    ```


> Add support for Hive Streaming
> ------------------------------
>
>                 Key: NIFI-1868
>                 URL: https://issues.apache.org/jira/browse/NIFI-1868
>             Project: Apache NiFi
>          Issue Type: New Feature
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>             Fix For: 1.0.0
>
>
> Traditionally adding new data into Hive requires gathering a large amount of 
> data onto HDFS and then periodically adding a new partition. This is 
> essentially a “batch insertion”. Insertion of new data into an existing 
> partition is not permitted. Hive Streaming API allows data to be pumped 
> continuously into Hive. The incoming data can be continuously committed in 
> small batches of records into an existing Hive partition or table. Once data 
> is committed it becomes immediately visible to all Hive queries initiated 
> subsequently.
> This case is to add a PutHiveStreaming processor to NiFi, to leverage the 
> Hive Streaming API to allow continuous streaming of data into a Hive 
> partition/table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NIFI-1868) Add support for Hive Streaming

Reply via email to