Juan Yu created KUDU-2025:
-----------------------------

             Summary: Upsert throughput is 10~20% slower than insert
                 Key: KUDU-2025
                 URL: https://issues.apache.org/jira/browse/KUDU-2025
             Project: Kudu
          Issue Type: Bug
            Reporter: Juan Yu


According to Kudu design, upsert should be faster than insert.
I ran some tests to compare upsert and insert performance
picked a few tables (those larger one like store_sales, catalog_sales) from 
tpcds, each table is hash partitioned by first 3 columns. data are generated 
(shouldn't have duplicate key), 100G ~ 1TB range. each time data are ingested 
to newly created table.
In general, the upsert throughput is 10~20% slower than insert according to CM 
metrics.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to