[ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648540#comment-15648540
 ] 

Dan Burkert commented on KUDU-1563:
-----------------------------------

[~mjacobs] brings up a good point that the duplicate-key constraint on insert 
is not the only constraint when writing to Kudu:

# duplicate primary-key constraint on insert
# missing primary-key constraint on delete and update
# missing range partition on any write
# missing column value in column without default on insert

Applications may want to 'ignore' any of these errors when writing to Kudu.  
Some of these errors are reported by the server (1, 2 and 4), and some are 
caught by the client before sending (3, and the client could check 4 but 
currently does not).

Of these constraints, I think 1. is the most commonly ignored, and that's why 
we decided to add first-class support for it by adding a special operation 
type.  Obviously that approach can't scale to all of the constraint types, much 
less their cross product.

I think I'm in favor of merging the current patch which introduces an INSERT 
IGNORE operation to ignore constraint violations of type 1 on the server side.  
Additionally, we should strongly consider adding a session-specific options to 
selectively ignore each type of constraint individually.  So for example, the 
client could use the INSERT IGNORE operation type if they want to selectively 
ignore some instances of duplicate primary-key constraints, or it could call 
{{KuduSession::ignoreDuplicatePrimaryKeyViolations}} to ignore all of them for 
the entire session.  We would also expose flags for the rest of the constraint 
types.

Finally, the client should expose how many violations of each type were ignored 
in the session statistics.

> Add support for INSERT IGNORE
> -----------------------------
>
>                 Key: KUDU-1563
>                 URL: https://issues.apache.org/jira/browse/KUDU-1563
>             Project: Kudu
>          Issue Type: New Feature
>            Reporter: Dan Burkert
>            Assignee: Brock Noland
>              Labels: newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to