Todd Lipcon created KUDU-1439:
---------------------------------

             Summary: Optimization for batch inserts into empty key ranges
                 Key: KUDU-1439
                 URL: https://issues.apache.org/jira/browse/KUDU-1439
             Project: Kudu
          Issue Type: Improvement
          Components: perf, tablet
            Reporter: Todd Lipcon


Got this idea from a CockroachDB optimization:
https://github.com/cockroachdb/cockroach/pull/6375

The short version is that if we have a moderately large batch of inserts which 
are sorted, we can do the following pseudocode:
- sort the inserts by primary key
- instead of using bloom filter checks, use SeekAtOrAfter on the first primary 
key in the batch. This yields the next higher primary key that might exist in 
the table (_nextKey_).
- for each of the keys in the sorted batch, if it's less than _nextKey_, we 
don't need to do an existence check for it.

In the common case where clients are writing non-overlapping batches of rows 
(eg importing from parquet) this should reduce the number of seeks and bloom 
checks dramatically (order of batch size). Plus, it doesn't require much new 
code to be written, so worth a shot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to