[jira] [Commented] (NIFI-3623) PutSQL сan't insert multiple records in postgres if one causes an error

Matt Burgess (Jira) Wed, 08 Jan 2020 19:16:25 -0800


    [ 
https://issues.apache.org/jira/browse/NIFI-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011360#comment-17011360
 ]


Matt Burgess commented on NIFI-3623:
------------------------------------

I believe the expected (and intended) behavior is to move all in the batch to 
failure as they are treated as a complete transaction. You may be looking for 
an UPSERT capability which we don't have yet, many DBs do this via the SQL 
standard MERGE operator, but that's usually on two existing tables vs a strict 
UPSERT (the latter is supported on some DBs such as Phoenix). 

We would likely have to add a property for the behavior you describe, since 
some folks will still want to treat a batch of (or fragmented) flowfiles as a 
single transaction, because if they expect all-or-nothing, if we do 
success-and-failure the statements in a batch can end up being executed out of 
order, be missing expected rows, etc. Also we'd need additional logic to weed 
out the failed inserts, route them somewhere else, then re-issue the batch we 
know works. IMO retry is not the right relationship to send flow files we know 
are always going to fail, so we should still use failure for that.



> PutSQL сan't insert multiple records in postgres if one causes an error
> -----------------------------------------------------------------------
>
>                 Key: NIFI-3623
>                 URL: https://issues.apache.org/jira/browse/NIFI-3623
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.2.0
>            Reporter: Rostislav Koryakin
>            Priority: Major
>
> PutSQL configured as follows:
> Concurrent tasks: 1
> Batch size: 100
> Obtain generated keys: false
> A flow is configured to make a snapshot of particular database table and put 
> into a postgres twice a day.
> Assume there is postgres db table "A" with fields: date, id, value. And 
> primary key: (date, id).
> Sometimes due to network issues or restart of postgres not all entries are 
> inserted.
> Example: "A" contains entries with id 1, 50, and 99.
> PutSQL consumes 100 flowfiles with id from 0 to 99 and tries to insert 
> entries. It starts transaction, but the transaction is rolled back due to 
> constraint violation for keys 1, 50 and 99. Next time the same situation is 
> repeated.
> According to PutSQL implementation: it expects the driver to return a list of 
> succeeded and failed statements. But for postgres - all statement are failed 
> and all flowfiles go to failure instead of retry.
> There several ways to solve it:
> 1) use batch size 1 (looks bad in terms of performance)
> 2) use obtain generated keys = true (there is no need for them)
> 3) Address the issue somehow and move 97 of 100 flowfiles to "retry" to allow 
> process them again.
> Is the expected behaviour in this situation to get 97 files in retry? Or it 
> is normal that all go to "failure" ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (NIFI-3623) PutSQL сan't insert multiple records in postgres if one causes an error

Reply via email to