[ 
https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675777#comment-16675777
 ] 

ASF GitHub Bot commented on NIFI-5788:
--------------------------------------

Github user patricker commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/3128#discussion_r230917511
  
    --- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java
 ---
    @@ -669,11 +685,20 @@ private void executeDML(ProcessContext context, 
ProcessSession session, FlowFile
                             }
                         }
                         ps.addBatch();
    +                    if (++currentBatchSize == batchSize) {
    --- End diff --
    
    Would it be beneficial to capture `currentBatchSize*batchIndex`, with 
`batchIndex` being incremented only after a successful call to `executeBatch()` 
as an attribute? My thinking is, if you have a failure, and only part of a 
batch was loaded, you could store how many rows were loaded successfully as an 
attribute?


> Introduce batch size limit in PutDatabaseRecord processor
> ---------------------------------------------------------
>
>                 Key: NIFI-5788
>                 URL: https://issues.apache.org/jira/browse/NIFI-5788
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>         Environment: Teradata DB
>            Reporter: Vadim
>            Priority: Major
>              Labels: pull-request-available
>
> Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE 
> prepared SQL statements. Specifically, Teradata JDBC driver 
> ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would 
> fail SQL statement when the batch overflows the internal limits.
> Dividing data into smaller chunks before the PutDatabaseRecord is applied can 
> work around the issue in certain scenarios, but generally, this solution is 
> not perfect because the SQL statements would be executed in different 
> transaction contexts and data integrity would not be preserved.
> The solution suggests the following:
>  * introduce a new optional parameter in *PutDatabaseRecord* processor, 
> *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE 
> statement; its default value is -1 (INFINITY) preserves the old behavior
>  * divide the input into batches of the specified size and invoke 
> PreparedStatement.executeBatch()  for each batch
> Pull request: [https://github.com/apache/nifi/pull/3128]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to