I've done with splitting indices: distributed joins has been fixed, issues with prepared statements cache has been disappear. Ticket Ignite-4106 [1] is ready for review.
[1] https://issues.apache.org/jira/browse/IGNITE-4106 On Mon, Dec 5, 2016 at 3:16 PM, Sergi Vladykin <sergi.vlady...@gmail.com> wrote: > I'd prefer to avoid merging ranges from index segments, it is a huge > performance penalty. > > I thought a bit more: why would one configure a different level of query > parallelism on different caches? I don't see any sane reason for this. Most > probably it will be a number of CPU cores on the box or some related > number. > > Thus may be we can allow to configure the SQL parallelism level for the > cluster and just mark the needed caches as SQL parallel? > > A couple of more questions: > > 1. It looks like joining segmented table with non-segmented will not always > work, thus we have to prohibit it. > > 2. It looks like we must not segment REPLICATED tables at all, because each > join with replicated table have to find the needed result. > > Sergi > > > > 2016-12-05 14:36 GMT+03:00 Andrey Mashenkov <andrey.mashen...@gmail.com>: > > > Copy from Review comment > > >Sergi: Another thing is how we will handle case if different caches in > > join have different parallelism level? > > Good question, Sergi. It seems we can't handle it. > > > > I've a crazy idea and not sure it is workable. > > What if we would split indices to power of 2 number of segments (it can > be > > configured per cache). > > Lets queries to be splitted to power of 2 number of threads, but number > of > > query threads should be less or equal number of segments size. > > > > If query involve indices with different number of segments, we should > have > > some way to map thread to indices. > > It looks to be easy if we would be able to wrap pairs of indices into > > single object to align indices number. > > > > E.g. lets we have Table1 with parallelizm level of 8 and Table2 with > > parallelizm level of 4. Then we would be able to run 4 threads where each > > thread would be run on 1 segment of Table2 index and wrapped pair of > index > > of Table1. > > > > Thoughts? > > > > On Wed, Nov 30, 2016 at 6:31 PM, Sergi Vladykin < > sergi.vlady...@gmail.com> > > wrote: > > > > > Cool! I'll take a look today. > > > > > > Sergi > > > > > > 2016-11-30 18:23 GMT+03:00 Andrey Mashenkov < > andrey.mashen...@gmail.com > > >: > > > > > > > Serj, you can see a PR attached to jira issue [1], that can be > opened > > > with > > > > upsource [2]. > > > > > > > > Tanks, I remember about distributed queries and wiil rework them > right > > > > after we come to agreemant that the solution for simple queries is > ok. > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-4106 > > > > [2] http://reviews.ignite.apache.org/ignite/review/IGNT-CR-15 > > > > > > > > > > > > > > > > On Wed, Nov 30, 2016 at 5:34 PM, Sergi Vladykin < > > > sergi.vlady...@gmail.com> > > > > wrote: > > > > > > > > > Per cache SQL parallelism level looks reasonable to me here. > > > > > > > > > > I'm not sure what do you mean about "prepared statement cache is > > > useless > > > > > with splitted indices", most probably you parallelize queries in > some > > > > wrong > > > > > way if this is true. > > > > > > > > > > Also do not forget about distributed joins: with parallel queries > on > > > the > > > > > same node we will need to make index range requests not only to > > remote > > > > > nodes, but to query contexts in parallel threads on the same local > > node > > > > as > > > > > well. > > > > > > > > > > Sergi > > > > > > > > > > 2016-11-30 17:23 GMT+03:00 Andrey Mashenkov < > > > andrey.mashen...@gmail.com > > > > >: > > > > > > > > > > > It looks like we can't just split sql query to several threads > due > > to > > > > H2 > > > > > > limitations. > > > > > > We can bound query thread with certain set of partitions, but, > > > > actually, > > > > > H2 > > > > > > will read whole index and then filter entries regarding its > > > partition. > > > > > So, > > > > > > we can get significant speed-up that way. > > > > > > > > > > > > Unfortunatelly, H2 does not support sharding, and we need to > have a > > > > > > workaround. We can try to split indices, so each query thread > would > > > be > > > > > > bounded with its own index part. > > > > > > I've implemented such prototype and get significant speed up with > > > > single > > > > > > node grid as if it was several node grid. > > > > > > Due to H2 knows nothing about splitted indices, we must bother > > about > > > > > every > > > > > > query should be run as TwoStepQuery and utilize all table index > > > parts. > > > > > > > > > > > > As index creation on demand is very heavy operation, index should > > be > > > > > > splitted when it is created. So we can set parallelizm level on > > > > per-cache > > > > > > base but not per-query. > > > > > > > > > > > > Another issue I've faced is that our implementation of prepared > > > > statement > > > > > > cache is useless with splitted indices. Prepared statement cached > > in > > > > > > thread local variable and it seems that the statement is bounded > > with > > > > > > certain index part. So if we reuse same statement for different > > index > > > > > parts > > > > > > we will get unexpected results. > > > > > > > > > > > > On Sun, Oct 30, 2016 at 8:46 PM, Dmitriy Setrakyan < > > > > > dsetrak...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > Completely agree, great point! > > > > > > > > > > > > > > On Sun, Oct 30, 2016 at 9:17 AM, Sergi Vladykin < > > > > > > sergi.vlady...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > I think it must be a maximum local parallelism level but not > > just > > > > > `on` > > > > > > > and > > > > > > > > `off` setting (the default is obviously 1). This along with > > > > > separately > > > > > > > > configurable query thread pool will give a finer grained > > control > > > > over > > > > > > > > resources. > > > > > > > > > > > > > > > > Sergi > > > > > > > > > > > > > > > > 2016-10-30 18:22 GMT+03:00 Dmitriy Setrakyan < > > > > dsetrak...@apache.org > > > > > >: > > > > > > > > > > > > > > > > > I already mentioned this in another email, but we should be > > > able > > > > to > > > > > > > turn > > > > > > > > > this property on and off on per-query and per-cache levels. > > > > > > > > > > > > > > > > > > On Sat, Oct 29, 2016 at 11:45 AM, Sergi Vladykin < > > > > > > > > sergi.vlady...@gmail.com > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Agree, lets implement such a parallelization. > > > > > > > > > > > > > > > > > > > > I think we will need an explicit setting for SqlQuery and > > > > > > > > SqlFieldsQuery, > > > > > > > > > > the default behavior should not change. > > > > > > > > > > > > > > > > > > > > Sergi > > > > > > > > > > > > > > > > > > > > 2016-10-28 22:39 GMT+03:00 Andrey Mashenkov < > > > > > > amashen...@gridgain.com > > > > > > > >: > > > > > > > > > > > > > > > > > > > > > So, now we have every SQL query run on each node in > > single > > > > > > thread. > > > > > > > > This > > > > > > > > > > can > > > > > > > > > > > be an issue for heavy queries or queries running on big > > > data > > > > > > sets, > > > > > > > > e.g. > > > > > > > > > > > analytical queries. > > > > > > > > > > > > > > > > > > > > > > For now, the only way to speed up such queries is to > add > > > more > > > > > > nodes > > > > > > > > to > > > > > > > > > > grid > > > > > > > > > > > running on same server. In this case, data will be > > > > partitioned > > > > > > over > > > > > > > > all > > > > > > > > > > > these nodes and query will be split and run on all > nodes. > > > > > > > > > > > > > > > > > > > > > > It seems, we can have a benefit if split SQL queries > > > locally > > > > as > > > > > > we > > > > > > > do > > > > > > > > > it > > > > > > > > > > > across nodes with TwoStepQuery. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > С уважением, > > > > > > Машенков Андрей Владимирович > > > > > > Тел. +7-921-932-61-82 > > > > > > > > > > > > Best regards, > > > > > > Andrey V. Mashenkov > > > > > > Cerr: +7-921-932-61-82 > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > С уважением, > > > > Машенков Андрей Владимирович > > > > Тел. +7-921-932-61-82 > > > > > > > > Best regards, > > > > Andrey V. Mashenkov > > > > Cerr: +7-921-932-61-82 > > > > > > > > > > > > > > > -- > > С уважением, > > Машенков Андрей Владимирович > > Тел. +7-921-932-61-82 > > > > Best regards, > > Andrey V. Mashenkov > > Cerr: +7-921-932-61-82 > > > -- С уважением, Машенков Андрей Владимирович Тел. +7-921-932-61-82 Best regards, Andrey V. Mashenkov Cerr: +7-921-932-61-82