[
https://issues.apache.org/jira/browse/KUDU-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Henke updated KUDU-1439:
------------------------------
Labels: performance (was: )
> Optimization for batch inserts into empty key ranges
> ----------------------------------------------------
>
> Key: KUDU-1439
> URL: https://issues.apache.org/jira/browse/KUDU-1439
> Project: Kudu
> Issue Type: Improvement
> Components: perf, tablet
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Major
> Labels: performance
>
> Got this idea from a CockroachDB optimization:
> https://github.com/cockroachdb/cockroach/pull/6375
> The short version is that if we have a moderately large batch of inserts
> which are sorted, we can do the following pseudocode:
> - sort the inserts by primary key
> - instead of using bloom filter checks, use SeekAtOrAfter on the first
> primary key in the batch. This yields the next higher primary key that might
> exist in the table (_nextKey_).
> - for each of the keys in the sorted batch, if it's less than _nextKey_, we
> don't need to do an existence check for it.
> In the common case where clients are writing non-overlapping batches of rows
> (eg importing from parquet) this should reduce the number of seeks and bloom
> checks dramatically (order of batch size). Plus, it doesn't require much new
> code to be written, so worth a shot.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)