Re: SQL query CPU utilization too low.

Andrey Mashenkov Wed, 11 Jan 2017 03:22:32 -0800

I've done with splitting indices: distributed joins has been fixed, issues
with prepared statements cache has been disappear.
Ticket Ignite-4106 [1] is ready for review.


[1] https://issues.apache.org/jira/browse/IGNITE-4106

On Mon, Dec 5, 2016 at 3:16 PM, Sergi Vladykin <sergi.vlady...@gmail.com>
wrote:

> I'd prefer to avoid merging ranges from index segments, it is a huge
> performance penalty.
>
> I thought a bit more: why would one configure a different level of query
> parallelism on different caches? I don't see any sane reason for this. Most
> probably it will be a number of CPU cores on the box or some related
> number.
>
> Thus may be we can allow to configure the SQL parallelism level for the
> cluster and just mark the needed caches as SQL parallel?
>
> A couple of more questions:
>
> 1. It looks like joining segmented table with non-segmented will not always
> work, thus we have to prohibit it.
>
> 2. It looks like we must not segment REPLICATED tables at all, because each
> join with replicated table have to find the needed result.
>
> Sergi
>
>
>
> 2016-12-05 14:36 GMT+03:00 Andrey Mashenkov <andrey.mashen...@gmail.com>:
>
> > Copy from Review comment
> > >Sergi: Another thing is how we will handle case if different caches in
> > join have different parallelism level?
> > Good question, Sergi. It seems we can't handle it.
> >
> > I've a crazy idea and not sure it is workable.
> > What if we would split indices to power of 2 number of segments (it can
> be
> > configured per cache).
> > Lets queries to be splitted to power of 2 number of threads, but number
> of
> > query threads should be less or equal number of segments size.
> >
> > If query involve indices with different number of segments, we should
> have
> > some way to map thread to indices.
> > It looks to be easy if we would be able to wrap pairs of indices into
> > single object to align indices number.
> >
> > E.g. lets we have Table1 with parallelizm level of 8 and Table2 with
> > parallelizm level of 4. Then we would be able to run 4 threads where each
> > thread would be run on 1 segment of Table2 index and wrapped pair of
> index
> > of Table1.
> >
> > Thoughts?
> >
> > On Wed, Nov 30, 2016 at 6:31 PM, Sergi Vladykin <
> sergi.vlady...@gmail.com>
> > wrote:
> >
> > > Cool! I'll take a look today.
> > >
> > > Sergi
> > >
> > > 2016-11-30 18:23 GMT+03:00 Andrey Mashenkov <
> andrey.mashen...@gmail.com
> > >:
> > >
> > > > Serj,  you can see a PR attached to jira issue [1], that can be
> opened
> > > with
> > > > upsource [2].
> > > >
> > > > Tanks, I remember about distributed queries and wiil rework them
> right
> > > > after we come to agreemant that the solution for simple queries is
> ok.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-4106
> > > > [2] http://reviews.ignite.apache.org/ignite/review/IGNT-CR-15
> > > >
> > > >
> > > >
> > > > On Wed, Nov 30, 2016 at 5:34 PM, Sergi Vladykin <
> > > sergi.vlady...@gmail.com>
> > > > wrote:
> > > >
> > > > > Per cache SQL parallelism level looks reasonable to me here.
> > > > >
> > > > > I'm not sure what do you mean about "prepared statement cache is
> > > useless
> > > > > with splitted indices", most probably you parallelize queries in
> some
> > > > wrong
> > > > > way if this is true.
> > > > >
> > > > > Also do not forget about distributed joins: with parallel queries
> on
> > > the
> > > > > same node we will need to make index range requests not only to
> > remote
> > > > > nodes, but to query contexts in parallel threads on the same local
> > node
> > > > as
> > > > > well.
> > > > >
> > > > > Sergi
> > > > >
> > > > > 2016-11-30 17:23 GMT+03:00 Andrey Mashenkov <
> > > andrey.mashen...@gmail.com
> > > > >:
> > > > >
> > > > > > It looks like we can't just split sql query to several threads
> due
> > to
> > > > H2
> > > > > > limitations.
> > > > > > We can bound query thread with certain set of partitions, but,
> > > > actually,
> > > > > H2
> > > > > > will read whole index and then filter entries regarding its
> > > partition.
> > > > > So,
> > > > > > we can get significant speed-up that way.
> > > > > >
> > > > > > Unfortunatelly, H2 does not support sharding, and we need to
> have a
> > > > > > workaround. We can try to split indices, so each query thread
> would
> > > be
> > > > > > bounded with its own index part.
> > > > > > I've implemented such prototype and get significant speed up with
> > > > single
> > > > > > node grid as if it was several node grid.
> > > > > > Due to H2 knows nothing about splitted indices, we must bother
> > about
> > > > > every
> > > > > > query should be run as TwoStepQuery and utilize all table index
> > > parts.
> > > > > >
> > > > > > As index creation on demand is very heavy operation, index should
> > be
> > > > > > splitted when it is created. So we can set parallelizm level on
> > > > per-cache
> > > > > > base but not per-query.
> > > > > >
> > > > > > Another issue I've faced is that our implementation of prepared
> > > > statement
> > > > > > cache is useless with splitted indices. Prepared statement cached
> > in
> > > > > > thread local variable and it seems that the statement is bounded
> > with
> > > > > > certain index part. So if we reuse same statement for different
> > index
> > > > > parts
> > > > > > we will get unexpected results.
> > > > > >
> > > > > > On Sun, Oct 30, 2016 at 8:46 PM, Dmitriy Setrakyan <
> > > > > dsetrak...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Completely agree, great point!
> > > > > > >
> > > > > > > On Sun, Oct 30, 2016 at 9:17 AM, Sergi Vladykin <
> > > > > > sergi.vlady...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I think it must be a maximum local parallelism level but not
> > just
> > > > > `on`
> > > > > > > and
> > > > > > > > `off` setting (the default is obviously 1). This along with
> > > > > separately
> > > > > > > > configurable query thread pool will give a finer grained
> > control
> > > > over
> > > > > > > > resources.
> > > > > > > >
> > > > > > > > Sergi
> > > > > > > >
> > > > > > > > 2016-10-30 18:22 GMT+03:00 Dmitriy Setrakyan <
> > > > dsetrak...@apache.org
> > > > > >:
> > > > > > > >
> > > > > > > > > I already mentioned this in another email, but we should be
> > > able
> > > > to
> > > > > > > turn
> > > > > > > > > this property on and off on per-query and per-cache levels.
> > > > > > > > >
> > > > > > > > > On Sat, Oct 29, 2016 at 11:45 AM, Sergi Vladykin <
> > > > > > > > sergi.vlady...@gmail.com
> > > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Agree, lets implement such a parallelization.
> > > > > > > > > >
> > > > > > > > > > I think we will need an explicit setting for SqlQuery and
> > > > > > > > SqlFieldsQuery,
> > > > > > > > > > the default behavior should not change.
> > > > > > > > > >
> > > > > > > > > > Sergi
> > > > > > > > > >
> > > > > > > > > > 2016-10-28 22:39 GMT+03:00 Andrey Mashenkov <
> > > > > > amashen...@gridgain.com
> > > > > > > >:
> > > > > > > > > >
> > > > > > > > > > > So, now we have every SQL query run on each node in
> > single
> > > > > > thread.
> > > > > > > > This
> > > > > > > > > > can
> > > > > > > > > > > be an issue for heavy queries or queries running on big
> > > data
> > > > > > sets,
> > > > > > > > e.g.
> > > > > > > > > > > analytical queries.
> > > > > > > > > > >
> > > > > > > > > > > For now, the only way to speed up such queries is to
> add
> > > more
> > > > > > nodes
> > > > > > > > to
> > > > > > > > > > grid
> > > > > > > > > > > running on same server. In this case, data will be
> > > > partitioned
> > > > > > over
> > > > > > > > all
> > > > > > > > > > > these nodes and query will be split and run on all
> nodes.
> > > > > > > > > > >
> > > > > > > > > > > It seems, we can have a benefit if split SQL queries
> > > locally
> > > > as
> > > > > > we
> > > > > > > do
> > > > > > > > > it
> > > > > > > > > > > across nodes with TwoStepQuery.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thoughts?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > С уважением,
> > > > > > Машенков Андрей Владимирович
> > > > > > Тел. +7-921-932-61-82
> > > > > >
> > > > > > Best regards,
> > > > > > Andrey V. Mashenkov
> > > > > > Cerr: +7-921-932-61-82
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > С уважением,
> > > > Машенков Андрей Владимирович
> > > > Тел. +7-921-932-61-82
> > > >
> > > > Best regards,
> > > > Andrey V. Mashenkov
> > > > Cerr: +7-921-932-61-82
> > > >
> > >
> >
> >
> >
> > --
> > С уважением,
> > Машенков Андрей Владимирович
> > Тел. +7-921-932-61-82
> >
> > Best regards,
> > Andrey V. Mashenkov
> > Cerr: +7-921-932-61-82
> >
>



-- 
С уважением,
Машенков Андрей Владимирович
Тел. +7-921-932-61-82

Best regards,
Andrey V. Mashenkov
Cerr: +7-921-932-61-82

Re: SQL query CPU utilization too low.

Reply via email to