Dima,

1) Primary key lookups could become a bit faster, but no breakthrough is
expected - there will be no need to jump from B+Tree leaf to data page, but
the tree itself will be bigger, because data records will take more space
than index records. I expect parity here.

2) We should observe dramatical improvement for scans (either ScanQuery or
SqlQuery) because data will be stored sequentially within blocks. Consider
the following case - a table with 10 records which could fit to 1 data
page. In current approach (heap) these records could be located in anywhere
from 1 data block to 10 different data blocks - it all depends on update
timings and free lists. So you end up in 10 page lock/unlock cycles and up
to 10 page reads, which will drive our LRU policy mad. In case of
index-organized approach data will be stored in 1 block in the best case
(sequential PK, no fragmentation), or 2-3 blocks in case of page splits or
segmentation. Clearly, this would be a huge win in terms of locks, page
reads and IO for scan workloads.

3) DML will be faster in case of sequential primary keys, e.g. (nearly)
monotonic LONG as transaction identifier. In this case data will be laid
out in a perfect sequential manner withing individual blocks, and in most
cases INSERT will lead to 1 data page update and 1 WAL record. Compare it
to 6 WAL record updates with current approach. On the other hand, random
INSERTS (e.g. UUID key) could become slower due to page splits and
fragmentation. Heap-organized storage is more preferable in this case.

4) Ideally we should not have index-per-partition, because in this case PK
range scans which are typical on OLAP workloads and JOINs will be slow. In
this case it would be not that easy to extract wipe out evicted partition.
This is another trade off - fast operations on stable system at the cost of
slower intermediate processes.

On Tue, Nov 28, 2017 at 6:27 AM, Dmitriy Setrakyan <dsetrak...@apache.org>
wrote:

> Vladimir,
>
> I definitely like the overall direction. My comments are below...
>
>
> On Mon, Nov 27, 2017 at 12:46 PM, Vladimir Ozerov <voze...@gridgain.com>
> wrote:
>
> >
> > I propose to adopt this approach in two phases:
> > 1) Optionally add data to leaf pages. This should improve our ScanQuery
> > dramatically
> >
>
>  Definitely a good idea. Shouldn't it make the primary lookups faster as
> well?
>
> 2) Optionally has single primary index instead of per-partition index. This
> > should improve our updates and SQL scans at the cost of harder rebalance
> > and recovery.
> >
>
> Can you explain why it would improve SQL updates and Scan queries?
>
> Also, why would this approach make rebalancing slower? If we keep the index
> sorted by partition, then the rebalancing process should be able to grab
> any partition at any time. Do you agree?
>
> D.
>

Reply via email to