[
https://issues.apache.org/jira/browse/NIFI-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14658322#comment-14658322
]
Toivo Adams commented on NIFI-293:
----------------------------------
Good idea.
We can even create SQL update/insert statement dynamically.
This should be possible using Avro schema information and metadata from JDBC.
We need table name which we can extract from Avro schema.
Next we can ask table metadata from JDBC. This gives us all column names and
types.
Also primary key names.
And also information which columns are mandatory (cannot be NULL in database).
We expect Avro schema name equals to database table name.
And Avro columns names equals to database table column names.
1. Update.
Avro data should contain primary key values. Otherwise update is not possible.
Other columns are optional. Meaning we will update only columns which are
present in Avro data.
SQL update statement can be built dynamically.
2. Insert
Hopefully database table primary keys are some sort of Auto Increment type.
In this case database will create automatically new primary keys values.
Avro data should contain at least mandatory column values.
SQL insert statement can be built dynamically.
Using Prepared statement and bigger batch size, insert/update should be fast.
Thanks
toivo
> Add a JDBC Processor for executing arbitrary SQL queries
> --------------------------------------------------------
>
> Key: NIFI-293
> URL: https://issues.apache.org/jira/browse/NIFI-293
> Project: Apache NiFi
> Issue Type: New Feature
> Reporter: Ricky Saltzer
> Attachments: AvroWriter.java
>
>
> This could be very useful for a variety of tasks, such as updating a value in
> a PostgreSQL table, or adding a new partition to Hive.
> Ideally, SQL commands could be generated using the NiFi expression language
> using FlowFile attributes.
> The processor should as generic as possible so that any of the popular JDBC
> drivers can be used (e.g. PostgreSQL, Hive, Impala).
> I'm still new to how processors are architected, but it seems that using a
> pre-defined service in the _services.xml_ file (like the distributed map
> cache) would be the most efficient way to share a connection pool across
> multiple JDBC processors.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)