[
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709366#comment-16709366
]
Brock Noland commented on KUDU-1563:
------------------------------------
Hey all,
I've got a use case which could really benefit from {{INSERT IGNORE DUPLICATE
KEY}} since we will have duplicates at a ratio of 3x so I am trying to revive
this work.
I am not sold on creating an extremely generic approach to server-side error
ignoring because I think it'll be really easy to abuse. I feel like Kudu
contributors should have some control over when ignoring errors is allowed so
we understand and validate the use case.
Furthermore, {{INSERT INGNORE ALL ERRORS}} won't work for my use case because
we are generating so many duplicates precisely because we are so concerned
about data loss.
Therefore I am suggesting we add a session level property allows the user to
ignore certain server side errors for {{{INSERT}},{{UPDATE}},{{DELETE}}}
{{IGNORE}} operations. Below is a likely edited summary from [~adar] of my
proposal:
* Move forward with a new operation {{INSERT IGNORE}}, with the understanding
that {{UPDATE IGNORE}} and {{DELETE IGNORE}} would be good additions in the
future. Together they comprise a new set of write operations that may ignore
certain errors.
* Document that {{INSERT IGNORE}} isn't just about duplicate primary keys; the
precise set of errors ignored by all of these new write operations is
configurable.
* Add new {{KuduSession}} properties that control the set of errors ignored by
write operations. This set will initially just be "duplicate primary key on
insert". The properties should be combinable (i.e. I should be able to ignore
duplicate primary keys AND missing partitions), but the granularity will be
session-level, not operation-level.
Default no errors ignored, so that the user is forced to configure the precise
set they want to ignore.
> Add support for INSERT IGNORE
> -----------------------------
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
> Issue Type: New Feature
> Reporter: Dan Burkert
> Assignee: Brock Noland
> Priority: Major
> Labels: newbie
>
> The Java client currently has an [option to ignore duplicate row key errors|
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
> which is implemented by filtering the errors on the client side. If we are
> going to continue to support this feature (and the consensus seems to be that
> we probably should), we should promote it to a first class operation type
> that is handled on the server side. This would have a modest perf.
> improvement since less errors are returned, and it would allow INSERT IGNORE
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)