Todd Lipcon created KUDU-1439:
---------------------------------
Summary: Optimization for batch inserts into empty key ranges
Key: KUDU-1439
URL: https://issues.apache.org/jira/browse/KUDU-1439
Project: Kudu
Issue Type: Improvement
Components: perf, tablet
Reporter: Todd Lipcon
Got this idea from a CockroachDB optimization:
https://github.com/cockroachdb/cockroach/pull/6375
The short version is that if we have a moderately large batch of inserts which
are sorted, we can do the following pseudocode:
- sort the inserts by primary key
- instead of using bloom filter checks, use SeekAtOrAfter on the first primary
key in the batch. This yields the next higher primary key that might exist in
the table (_nextKey_).
- for each of the keys in the sorted batch, if it's less than _nextKey_, we
don't need to do an existence check for it.
In the common case where clients are writing non-overlapping batches of rows
(eg importing from parquet) this should reduce the number of seeks and bloom
checks dramatically (order of batch size). Plus, it doesn't require much new
code to be written, so worth a shot.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)