[ 
https://issues.apache.org/jira/browse/NIFI-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shcherbakov updated NIFI-6934:
-------------------------------------
    Description: 
The PutDatabaseRecord processing is setup for batch record processing for more 
performant records processing and to be able to use JDBC batch API.

Unfortunately, in this setup, in case if a single record in the batch fails to 
be inserted into the target SQL database, the entire transaction with the full 
DML batch gets rolled back.
 That is, a single failure (e.g. because of a duplicate existing in the 
database with the same unique business key constraint) makes using 
PutDatabaseRecord processor unusable in batch mode.

A common workaround for that is inserting a SplitRecord before the 
PutDatabaseProcessor. That approach works but obviously has disadvantages (slow 
performance, higher resource consumption, downstream logic changes, since the 
number of FlowFiles explodes).

The PostgresSQL starting with version 9.5 supports a special SQL syntax 
extension that effectively allows UPSERTS, that is, adding new records in case 
of no constraint conflicts and updating existing record (or doing nothing) in 
case if there is a conflict:
 [http://www.postgresqltutorial.com/postgresql-upsert/]

Such an Upsert would solve the above mentioned problems of the 
PutDatabaseRecord processor.
 Adding support for such extension looks also fairly trivial in the 
PutDatabaseRecord processor.

 

 

  was:
The PutDatabaseRecord processing is setup for batch record processing for more 
performant records processing and to be able to use JDBC batch API.

Unfortunately, in this setup, in case if a single record in the batch fails to 
be inserted into the target SQL database, the entire transaction with the full 
DML batch gets rolled back.
That is, a single failure (e.g. because of a duplicate existing in the database 
with the same unique business key constraint) makes using PutDatabaseRecord 
processor unusable in batch mode.

A common workaround for that is inserting a SplitRecord before the 
PutDatabaseProcessor. That approach works but obviously has disadvantages (slow 
performance, higher resource consumption, downstream logic changes, since the 
number of FlowFiles explodes).

The PostgresSQL starting with version 9.5 support special SQL syntax extension 
that effectively allows UPSERTS, that is, adding new records in case of no 
constraint conflicts and updating existing record (or doing nothing) in case if 
there is a conflict:
[http://www.postgresqltutorial.com/postgresql-upsert/]


Such an Upsert would solve the above mentioned problems of the 
PutDatabaseRecord processor.
Adding support for such extension looks also fairly trivial in the 
PutDatabaseRecord processor.



 

 


> Support Postgres 9.5+ Upsert in the standard PutDatabaseRecord processor
> ------------------------------------------------------------------------
>
>                 Key: NIFI-6934
>                 URL: https://issues.apache.org/jira/browse/NIFI-6934
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>    Affects Versions: 1.10.0
>            Reporter: Sergey Shcherbakov
>            Priority: Major
>
> The PutDatabaseRecord processing is setup for batch record processing for 
> more performant records processing and to be able to use JDBC batch API.
> Unfortunately, in this setup, in case if a single record in the batch fails 
> to be inserted into the target SQL database, the entire transaction with the 
> full DML batch gets rolled back.
>  That is, a single failure (e.g. because of a duplicate existing in the 
> database with the same unique business key constraint) makes using 
> PutDatabaseRecord processor unusable in batch mode.
> A common workaround for that is inserting a SplitRecord before the 
> PutDatabaseProcessor. That approach works but obviously has disadvantages 
> (slow performance, higher resource consumption, downstream logic changes, 
> since the number of FlowFiles explodes).
> The PostgresSQL starting with version 9.5 supports a special SQL syntax 
> extension that effectively allows UPSERTS, that is, adding new records in 
> case of no constraint conflicts and updating existing record (or doing 
> nothing) in case if there is a conflict:
>  [http://www.postgresqltutorial.com/postgresql-upsert/]
> Such an Upsert would solve the above mentioned problems of the 
> PutDatabaseRecord processor.
>  Adding support for such extension looks also fairly trivial in the 
> PutDatabaseRecord processor.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to