Todd Lipcon created KUDU-1398:
---------------------------------

             Summary: CFile index blocks can store shortest separating prefix
                 Key: KUDU-1398
                 URL: https://issues.apache.org/jira/browse/KUDU-1398
             Project: Kudu
          Issue Type: Bug
          Components: cfile
    Affects Versions: 0.8.0
            Reporter: Todd Lipcon


Currently, the cfile value index blocks store the entire value for the first 
value in each data block. This is actually not necessary -- we only need to 
store the shortest string that falls between the last key of the previous block 
and the first key of this block. For example:

Data block 1: apple,banana,cardamom
Data block 2: carrot,epazote,fennel

Today we would store:

Index block entries: ['apple' -> block 1, 'carrot' -> block 2]

Minimally, we can store:

Index block entries: ['' -> block 1, 'care' -> block 2]

In this example only a few bytes are saved, but in the case of longer key 
strings, the savings can be substantial. For example, if the key is a 36-byte 
UUID uniformly distributed, and we have 1000x32KB data blocks in a 32MB cfile, 
we can probably shorten the index entries to only 2-3 bytes on average for a 
big savings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to