[
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023372#comment-16023372
]
Dan Burkert commented on KUDU-1563:
-----------------------------------
Just learned about a usecase that would be well-served by an {{ON DUPLICATE KEY
UPDATE}} mechanism in Kudu. In particular, the workload is ingesting batches
of timestamped records, with each record being quite large. Individual batches
routinely contain duplicate records whose contents only differ by collection
timestamp. Ideally as new batches are ingested, duplicate records would update
the collection timestamp column, but skip updating the larger data columns.
To do this effectively, we could have a duplicate-resolution strategy that
updates individual columns to new values, effectively {{ON DUPLICATE KEY
UPDATE}} with only constants allowed as the update value. To be efficient, and
to map well to SQL, this should probably be specified once on the entire batch
instead of on individual ops.
> Add support for INSERT IGNORE
> -----------------------------
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
> Issue Type: New Feature
> Reporter: Dan Burkert
> Assignee: Brock Noland
> Labels: newbie
>
> The Java client currently has an [option to ignore duplicate row key errors|
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
> which is implemented by filtering the errors on the client side. If we are
> going to continue to support this feature (and the consensus seems to be that
> we probably should), we should promote it to a first class operation type
> that is handled on the server side. This would have a modest perf.
> improvement since less errors are returned, and it would allow INSERT IGNORE
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)