Re: How to only take new rows using ExecuteSQL processor?

Simon Ball Thu, 31 Mar 2016 02:51:07 -0700

Hi Paul,

In the scenario where you need complex joins and incremental loads, the best 
bet is probably to create a view in your database with the query required. The 
QueryDatabaseTable can operate against this view as long as the view has a 
suitable ‘id’ column in it.


That would provide a work around for the lack of a custom query at the moment. 

I also not on your ticket an excellent point about limiting the return batch 
size. You can achieve something like this with the query max time setting, but 
it would certainly be a good addition (and maybe deserves its own ticket).

Simon

> On 31 Mar 2016, at 10:15, Paul Bormans <[email protected]> wrote:
> 
> Hi Jou,
> 
> Thank you for the tip: great!!!
> 
> I guess nifi is too new still because i did some extensive searching on
> this subject and QueryDatabaseTable was not mentioned.....
> 
> This processor does exactly what i expect/need!
> 
> One shortcoming... maybe i should enter a ticket for this. Usually
> extraction of data from rdbms involves complex queries with joins and these
> are not supported as far as i can see. We could also extend the processor
> so that a configuration option is to specify the full query which i believe
> is much more flexible than enumerating columns from a specific table.
> 
> Paul
> 
> 
> 
> On Wed, Mar 30, 2016 at 5:19 PM, Joe Witt <[email protected]> wrote:
> 
>> Paul,
>> 
>> In Apache NiFi 0.6.0 if you're looking for a change capture type
>> mechanism to source from relational databases take a look at
>> QueryDatabaseTable [1].
>> 
>> That processor is new and any feedback and or contribs for it would be
>> awesome.
>> 
>> ExecuteSQL does have some time driven use cases to capture snapshots
>> and such but you're right that it doesn't sound like a good fit for
>> your case.
>> 
>> [1]
>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.QueryDatabaseTable/index.html
>> 
>> On Wed, Mar 30, 2016 at 9:11 AM, Paul Bormans <[email protected]> wrote:
>>> I'm evaluating Apache Nifi as data ingestion tool to load data from an
>>> RDBMS into S3. A first test shows odd behavior where the same rows are
>>> written to the flowfile over and over again while i expected that only
>> new
>>> rows are written.
>>> 
>>> In fact i was missing configuration options to specify what column could
>> be
>>> used to query only for new rows.
>>> 
>>> Taking a look at the processor implementation makes me believe that the
>>> only option is to define a query including OFFSET n LIMIT m where "n" is
>>> dynamically set based upon previous onTriggers; would this even be
>> possible?
>>> 
>>> Some setup info:
>>> nifi: 0.6.0
>>> backend: postgresql
>>> driver: postgresql-9.4.1208.jre6.jar
>>> query: select * from addresses
>>> 
>>> More in general i don't see a use-case where the current ExecuteSQL
>>> processor fits as a processor (without input flowfile). Someone can
>> explain?
>>> 
>>> Paul
>>

Re: How to only take new rows using ExecuteSQL processor?

Reply via email to