[ 
https://issues.apache.org/jira/browse/NIFI-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486207#comment-14486207
 ] 

Ryan Blue commented on NIFI-293:
--------------------------------

Good to hear, it would be a really valuable processor. I've been thinking about 
it lately, but I don't think I have time to build a prototype myself... If you 
guys decide to build one, then check out the Apache Sqoop project.

Sqoop version 1 was a MR-based tool to pull from databases into Hadoop, but 
Sqoop version 2 changes that by adding a new connector API that lets you use 
Sqoop to pull from databases and isn't tied to MR or Hadoop. Using that API, 
you could easily build a processor that uses any Sqoop connector to pull data 
from a database. Those Sqoop connectors can be tailored to a particular 
database so they run very quickly, already produce Avro as their intermediate 
format (we're working on the types though), and can also take care of 
partitioning the work into chunks that can be done in parallel. I think the big 
challenge would be coordinating those tasks on a NiFi cluster, but you could 
add a service to handle that.

Definitely check it out, and I'll make sure we get you info if you have any 
questions on it.

> Add a JDBC Processor for executing arbitrary SQL queries
> --------------------------------------------------------
>
>                 Key: NIFI-293
>                 URL: https://issues.apache.org/jira/browse/NIFI-293
>             Project: Apache NiFi
>          Issue Type: New Feature
>            Reporter: Ricky Saltzer
>         Attachments: AvroWriter.java
>
>
> This could be very useful for a variety of tasks, such as updating a value in 
> a PostgreSQL table, or adding a new partition to Hive. 
> Ideally, SQL commands could be generated using the NiFi expression language 
> using FlowFile attributes. 
> The processor should as generic as possible so that any of the popular JDBC 
> drivers can be used (e.g. PostgreSQL, Hive, Impala). 
> I'm still new to how processors are architected, but it seems that using a 
> pre-defined service in the _services.xml_ file (like the distributed map 
> cache) would be the most efficient way to share a connection pool across 
> multiple JDBC processors. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to