[ 
https://issues.apache.org/jira/browse/KUDU-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033943#comment-16033943
 ] 

Jean-Daniel Cryans commented on KUDU-2025:
------------------------------------------

I think that if you are comparing INSERT vs UPSERT for initial data loads then 
it's expected. The only way an UPSERT could be faster is if you are updating, 
because then we can short-circuit once you find the right row set.

> Upsert throughput is 10~20% slower than insert
> ----------------------------------------------
>
>                 Key: KUDU-2025
>                 URL: https://issues.apache.org/jira/browse/KUDU-2025
>             Project: Kudu
>          Issue Type: Bug
>            Reporter: Juan Yu
>
> According to Kudu design, upsert should be faster than insert.
> I ran some tests to compare upsert and insert performance
> picked a few tables (those larger one like store_sales, catalog_sales) from 
> tpcds, each table is hash partitioned by first 3 columns. data are generated 
> (shouldn't have duplicate key), 100G ~ 1TB range. each time data are ingested 
> to newly created table.
> In general, the upsert throughput is 10~20% slower than insert according to 
> CM metrics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to