Re: Rework storage format to index-organized approach

Alexey Kuznetsov Wed, 29 Nov 2017 00:15:06 -0800

Vova,

If we are going to rework indexes, could we also think about supporting TTL
Indexes (like in Mongo DB [1])?



[1] https://docs.mongodb.com/manual/core/index-ttl/


On Tue, Nov 28, 2017 at 3:46 AM, Vladimir Ozerov <[email protected]>
wrote:

> Igniters,
>
> I'd like to start a discussion about new storage format for Ignite. Our
> current approach is so-called *heap-organized* storage with secondary index
> per partition. It has a number of drawbacks:
> 1) Slow scans (joins, OLAP workload) - data is writen in arbitrary manner,
> so iteration over base index leads to multiple page reads and page locks
> 2) Slow writes in case of OLTP workload- every update touches miltiple
> index and free-list pages (a kind of write amplification)
> 3) Duplicated PK index when SQL is enabled - our base index cannot be used
> for lookups or range scans. This makes write amplification effects even
> worse.
>
> All mature RDBMS systems emply alternative format as default -
> *index-organized* storage. In this case primary index leaf pages is data
> pages. Rowse are sorted inside data pages. This gives:
> - Blazingly fast scans (no dereference, less page reads, less evictions,
> less locks)
> - Fast writes in OLTP workloads when PK index column (e.g. ID) grows
> monotonically (you need to *update only one page* if there are no splits)
> - Slower random writes due to index fragmentation compared to heap
>
> I propose to adopt this approach in two phases:
> 1) Optionally add data to leaf pages [1]. This should improve our ScanQuery
> dramatically
> 2) Optionally has single primary index instead of per-partition index [2].
> This should improve our updates and SQL scans at the cost of harder
> rebalance and recovery.
>
> Thoughts?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-7026
> [2] https://issues.apache.org/jira/browse/IGNITE-7027
>



-- 
Alexey Kuznetsov

Re: Rework storage format to index-organized approach

Reply via email to