Sergey, Thanks for detailed explanation! I agree with your suggestion, let's do on-heap (points 1 to 3) first and then switch to off-heap.
Can you please create a parent ticket with necessary subtasks, so we can properly track it in Jira? D. On Wed, Sep 23, 2015 at 11:10 AM, Sergi Vladykin <[email protected]> wrote: > As part of this discussion I want to talk about non-snapshotable indexes in > general. > > There are few things we can do to improve performance of queries and > indexing. > > 1. We can implement SkipList based index. Probably it will not be much > faster than current SnapTree, but GC and Optimizer will feel better. > 2. We can implement ConcurrentMap based index as I wrote earlier. For joins > on big caches it must give us a good boost. > > 3. Because these indexes are not snapshotable we can eliminate taking > writeLock and for taking snapshots. This writeLock stops concurrent updates > of indexes, so on mixed query/update workloads it will be noticeable > improvement. > > 4. Implement off-heap SkipList. I already did that for Hadoop Accelerator > (though it was append only), so this code can be taken as a base. > 5. Implement offheap ConcurrentMap. Probably we can use something from our > current GridOffheapProcessor. > > The benefit of this is improved performance. > The drawback is that all the updates (even partial) will be immediately > visible and index search can occur when index is in inconsistent state like > when on update the old value was already removed from index but the new one > was not added yet. > > I think items 1-2 can be done in a few days, item 3 can take couple of days > as well. Items 4-5 are probably the toughest ones, I think we should > postpone them until we will be production ready with 1-3. > > Sergi > > > > 2015-09-19 17:49 GMT+03:00 Dmitriy Setrakyan <[email protected]>: > > > On Sat, Sep 19, 2015 at 2:04 PM, Sergi Vladykin < > [email protected]> > > wrote: > > > > > Not necessary, you can configure to have either sorted index or hash > > index > > > or both. > > > In the last case as far as I understand optimizer just will pick up > hash > > > index for > > > equality conditions because it will have lower cost. > > > > > > The only thing I'm currently not sure of is how to add this to > > > configuration > > > (our indexed types config already looks like piece of crap, don't want > to > > > complicate it even more). > > > > > > > I think index type should be specified at the annotation level. As far as > > configuring query metadata in XML, I agree with you, we should clean up > the > > design. > > > > > > > > > > Sergi > > > > > > 2015-09-18 19:05 GMT+03:00 Dmitriy Setrakyan <[email protected]>: > > > > > > > Sergi, > > > > > > > > Does it mean that field "a" will now have 2 indexes, hash and sorted? > > > > > > > > D. > > > > > > > > On Fri, Sep 18, 2015 at 4:01 PM, Sergi Vladykin < > > > [email protected]> > > > > wrote: > > > > > > > > > Guys, > > > > > > > > > > It seems that for simple equality queries like > > > > > > > > > > SELECT * FROM x WHERE a = ? > > > > > > > > > > it is more effective to use hash indexes which do not even need to > be > > > > > snapshotable. > > > > > I think it will be easy to implement one based on > ConcurrentHashMap. > > > > > > > > > > Thoughts? > > > > > > > > > > Sergi > > > > > > > > > > > > > > >
