Juan Yu created KUDU-2025:
-----------------------------
Summary: Upsert throughput is 10~20% slower than insert
Key: KUDU-2025
URL: https://issues.apache.org/jira/browse/KUDU-2025
Project: Kudu
Issue Type: Bug
Reporter: Juan Yu
According to Kudu design, upsert should be faster than insert.
I ran some tests to compare upsert and insert performance
picked a few tables (those larger one like store_sales, catalog_sales) from
tpcds, each table is hash partitioned by first 3 columns. data are generated
(shouldn't have duplicate key), 100G ~ 1TB range. each time data are ingested
to newly created table.
In general, the upsert throughput is 10~20% slower than insert according to CM
metrics.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)