[ https://issues.apache.org/jira/browse/NIFI-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Turcsanyi updated NIFI-7989: ---------------------------------- Fix Version/s: 1.13.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Add Hive "data drift" processor > ------------------------------- > > Key: NIFI-7989 > URL: https://issues.apache.org/jira/browse/NIFI-7989 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions > Reporter: Matt Burgess > Assignee: Matt Burgess > Priority: Major > Fix For: 1.13.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > It would be nice to have a Hive processor (one for each Hive NAR) that could > check an incoming record-based flowfile against a destination table, and > either add columns and/or partition values, or even create the table if it > does not exist. Such a processor could be used in a flow where the incoming > data's schema can change and we want to be able to write it to a Hive table, > preferably by using PutHDFS, PutParquet, or PutORC to place it directly where > it can be queried. > Such a processor should be able to use a HiveConnectionPool to execute any > DDL (ALTER TABLE ADD COLUMN, e.g.) necessary to make the table match the > incoming data. For Partition Values, they could be provided via a property > that supports Expression Language. In such a case, an ALTER TABLE would be > issued to add the partition directory. > Whether the table is created or updated, and whether there are partition > values to consider, an attribute should be written to the outgoing flowfile > corresponding to the location of the table (and any associated partitions). > This supports the idea of having a flow that updates a Hive table based on > the incoming data, and then allows the user to put the flowfile directly into > the destination location (PutHDFS, e.g.) instead of having to load it using > HiveQL or being subject to the restrictions of Hive Streaming tables > (ORC-backed, transactional, etc.) -- This message was sent by Atlassian Jira (v8.3.4#803005)