Re: SQL Hash indexes

Dmitriy Setrakyan Wed, 23 Sep 2015 09:23:28 -0700

Sergey,

Thanks for detailed explanation! I agree with your suggestion, let's do
on-heap (points 1 to 3) first and then switch to off-heap.


Can you please create a parent ticket with necessary subtasks, so we can
properly track it in Jira?

D.

On Wed, Sep 23, 2015 at 11:10 AM, Sergi Vladykin <[email protected]>
wrote:

> As part of this discussion I want to talk about non-snapshotable indexes in
> general.
>
> There are few things we can do to improve performance of queries and
> indexing.
>
> 1. We can implement SkipList based index. Probably it will not be much
> faster than current SnapTree, but GC and Optimizer will feel better.
> 2. We can implement ConcurrentMap based index as I wrote earlier. For joins
> on big caches it must give us a good boost.
>
> 3. Because these indexes are not snapshotable we can eliminate taking
> writeLock and for taking snapshots. This writeLock stops concurrent updates
> of indexes, so on mixed query/update workloads it will be noticeable
> improvement.
>
> 4. Implement off-heap SkipList. I already did that for Hadoop Accelerator
> (though it was append only), so this code can be taken as a base.
> 5. Implement offheap ConcurrentMap. Probably we can use something from our
> current GridOffheapProcessor.
>
> The benefit of this is improved performance.
> The drawback is that all the updates (even partial) will be immediately
> visible and index search can occur when index is in inconsistent state like
> when on update the old value was already removed from index but the new one
> was not added yet.
>
> I think items 1-2 can be done in a few days, item 3 can take couple of days
> as well. Items 4-5 are probably the toughest ones, I think we should
> postpone them until we will be production ready with 1-3.
>
> Sergi
>
>
>
> 2015-09-19 17:49 GMT+03:00 Dmitriy Setrakyan <[email protected]>:
>
> > On Sat, Sep 19, 2015 at 2:04 PM, Sergi Vladykin <
> [email protected]>
> > wrote:
> >
> > > Not necessary, you can configure to have either sorted index or hash
> > index
> > > or both.
> > > In the last case as far as I understand optimizer just will pick up
> hash
> > > index for
> > > equality conditions because it will have lower cost.
> > >
> > > The only thing I'm currently not sure of is how to add this to
> > > configuration
> > > (our indexed types config already looks like piece of crap, don't want
> to
> > > complicate it even more).
> > >
> >
> > I think index type should be specified at the annotation level. As far as
> > configuring query metadata in XML, I agree with you, we should clean up
> the
> > design.
> >
> >
> > >
> > > Sergi
> > >
> > > 2015-09-18 19:05 GMT+03:00 Dmitriy Setrakyan <[email protected]>:
> > >
> > > > Sergi,
> > > >
> > > > Does it mean that field "a" will now have 2 indexes, hash and sorted?
> > > >
> > > > D.
> > > >
> > > > On Fri, Sep 18, 2015 at 4:01 PM, Sergi Vladykin <
> > > [email protected]>
> > > > wrote:
> > > >
> > > > > Guys,
> > > > >
> > > > > It seems that for simple equality queries like
> > > > >
> > > > > SELECT * FROM x WHERE a = ?
> > > > >
> > > > > it is more effective to use hash indexes which do not even need to
> be
> > > > > snapshotable.
> > > > > I think it will be easy to implement one based on
> ConcurrentHashMap.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > Sergi
> > > > >
> > > >
> > >
> >
>

Re: SQL Hash indexes

Reply via email to