Re: SQL Hash indexes

Sergi Vladykin Wed, 23 Sep 2015 09:22:13 -0700

As part of this discussion I want to talk about non-snapshotable indexes in
general.

There are few things we can do to improve performance of queries and
indexing.

1. We can implement SkipList based index. Probably it will not be much
faster than current SnapTree, but GC and Optimizer will feel better.
2. We can implement ConcurrentMap based index as I wrote earlier. For joins
on big caches it must give us a good boost.

3. Because these indexes are not snapshotable we can eliminate taking
writeLock and for taking snapshots. This writeLock stops concurrent updates
of indexes, so on mixed query/update workloads it will be noticeable
improvement.

4. Implement off-heap SkipList. I already did that for Hadoop Accelerator
(though it was append only), so this code can be taken as a base.
5. Implement offheap ConcurrentMap. Probably we can use something from our
current GridOffheapProcessor.

The benefit of this is improved performance.
The drawback is that all the updates (even partial) will be immediately
visible and index search can occur when index is in inconsistent state like
when on update the old value was already removed from index but the new one
was not added yet.

I think items 1-2 can be done in a few days, item 3 can take couple of days
as well. Items 4-5 are probably the toughest ones, I think we should
postpone them until we will be production ready with 1-3.

Sergi

2015-09-19 17:49 GMT+03:00 Dmitriy Setrakyan <[email protected]>:

> On Sat, Sep 19, 2015 at 2:04 PM, Sergi Vladykin <[email protected]>
> wrote:
>
> > Not necessary, you can configure to have either sorted index or hash
> index
> > or both.
> > In the last case as far as I understand optimizer just will pick up hash
> > index for
> > equality conditions because it will have lower cost.
> >
> > The only thing I'm currently not sure of is how to add this to
> > configuration
> > (our indexed types config already looks like piece of crap, don't want to
> > complicate it even more).
> >
>
> I think index type should be specified at the annotation level. As far as
> configuring query metadata in XML, I agree with you, we should clean up the
> design.
>
>
> >
> > Sergi
> >
> > 2015-09-18 19:05 GMT+03:00 Dmitriy Setrakyan <[email protected]>:
> >
> > > Sergi,
> > >
> > > Does it mean that field "a" will now have 2 indexes, hash and sorted?
> > >
> > > D.
> > >
> > > On Fri, Sep 18, 2015 at 4:01 PM, Sergi Vladykin <
> > [email protected]>
> > > wrote:
> > >
> > > > Guys,
> > > >
> > > > It seems that for simple equality queries like
> > > >
> > > > SELECT * FROM x WHERE a = ?
> > > >
> > > > it is more effective to use hash indexes which do not even need to be
> > > > snapshotable.
> > > > I think it will be easy to implement one based on ConcurrentHashMap.
> > > >
> > > > Thoughts?
> > > >
> > > > Sergi
> > > >
> > >
> >
>

Re: SQL Hash indexes

Reply via email to