Vladimir, How the free lists will be affected by the indexed-organized architecture? From what I see they’re becoming optional.
— Denis > On Nov 27, 2017, at 12:46 PM, Vladimir Ozerov <voze...@gridgain.com> wrote: > > Igniters, > > I'd like to start a discussion about new storage format for Ignite. Our > current approach is so-called *heap-organized* storage with secondary index > per partition. It has a number of drawbacks: > 1) Slow scans (joins, OLAP workload) - data is writen in arbitrary manner, > so iteration over base index leads to multiple page reads and page locks > 2) Slow writes in case of OLTP workload- every update touches miltiple > index and free-list pages (a kind of write amplification) > 3) Duplicated PK index when SQL is enabled - our base index cannot be used > for lookups or range scans. This makes write amplification effects even > worse. > > All mature RDBMS systems emply alternative format as default - > *index-organized* storage. In this case primary index leaf pages is data > pages. Rowse are sorted inside data pages. This gives: > - Blazingly fast scans (no dereference, less page reads, less evictions, > less locks) > - Fast writes in OLTP workloads when PK index column (e.g. ID) grows > monotonically (you need to *update only one page* if there are no splits) > - Slower random writes due to index fragmentation compared to heap > > I propose to adopt this approach in two phases: > 1) Optionally add data to leaf pages [1]. This should improve our ScanQuery > dramatically > 2) Optionally has single primary index instead of per-partition index [2]. > This should improve our updates and SQL scans at the cost of harder > rebalance and recovery. > > Thoughts? > > [1] https://issues.apache.org/jira/browse/IGNITE-7026 > [2] https://issues.apache.org/jira/browse/IGNITE-7027