[ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709366#comment-16709366
 ] 

Brock Noland commented on KUDU-1563:
------------------------------------

Hey all,

I've got a use case which could really benefit from {{INSERT IGNORE DUPLICATE 
KEY}} since we will have duplicates at a ratio of 3x so I am trying to revive 
this work.

I am not sold on creating an extremely generic approach to server-side error 
ignoring because I think it'll be really easy to abuse. I feel like Kudu 
contributors should have some control over when ignoring errors is allowed so 
we understand and validate the use case.

Furthermore, {{INSERT INGNORE ALL ERRORS}} won't work for my use case because 
we are generating so many duplicates precisely because we are so concerned 
about data loss.

Therefore I am suggesting we add a session level property allows the user to 
ignore certain server side errors for {{{INSERT}},{{UPDATE}},{{DELETE}}} 
{{IGNORE}} operations. Below is a likely edited summary from [~adar] of my 
proposal:

* Move forward with a new operation {{INSERT IGNORE}}, with the understanding 
that {{UPDATE IGNORE}} and {{DELETE IGNORE}} would be good additions in the 
future. Together they comprise a new set of write operations that may ignore 
certain errors.
* Document that {{INSERT IGNORE}} isn't just about duplicate primary keys; the 
precise set of errors ignored by all of these new write operations is 
configurable.
* Add new {{KuduSession}} properties that control the set of errors ignored by 
write operations. This set will initially just be "duplicate primary key on 
insert". The properties should be combinable (i.e. I should be able to ignore 
duplicate primary keys AND missing partitions), but the granularity will be 
session-level, not operation-level.
Default no errors ignored, so that the user is forced to configure the precise 
set they want to ignore.


> Add support for INSERT IGNORE
> -----------------------------
>
>                 Key: KUDU-1563
>                 URL: https://issues.apache.org/jira/browse/KUDU-1563
>             Project: Kudu
>          Issue Type: New Feature
>            Reporter: Dan Burkert
>            Assignee: Brock Noland
>            Priority: Major
>              Labels: newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to