[ 
https://issues.apache.org/jira/browse/KUDU-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033947#comment-16033947
 ] 

Juan Yu commented on KUDU-2025:
-------------------------------

yes, my tests are initial data loading, shouldn't have any duplicate key. feel 
free to close it.

> Upsert throughput is 10~20% slower than insert
> ----------------------------------------------
>
>                 Key: KUDU-2025
>                 URL: https://issues.apache.org/jira/browse/KUDU-2025
>             Project: Kudu
>          Issue Type: Bug
>            Reporter: Juan Yu
>
> According to Kudu design, upsert should be faster than insert.
> I ran some tests to compare upsert and insert performance
> picked a few tables (those larger one like store_sales, catalog_sales) from 
> tpcds, each table is hash partitioned by first 3 columns. data are generated 
> (shouldn't have duplicate key), 100G ~ 1TB range. each time data are ingested 
> to newly created table.
> In general, the upsert throughput is 10~20% slower than insert according to 
> CM metrics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to