Re: SQL query CPU utilization too low.

Sergi Vladykin Wed, 30 Nov 2016 07:32:46 -0800

Cool! I'll take a look today.

Sergi


2016-11-30 18:23 GMT+03:00 Andrey Mashenkov <andrey.mashen...@gmail.com>:

> Serj,  you can see a PR attached to jira issue [1], that can be opened with
> upsource [2].
>
> Tanks, I remember about distributed queries and wiil rework them right
> after we come to agreemant that the solution for simple queries is ok.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-4106
> [2] http://reviews.ignite.apache.org/ignite/review/IGNT-CR-15
>
>
>
> On Wed, Nov 30, 2016 at 5:34 PM, Sergi Vladykin <sergi.vlady...@gmail.com>
> wrote:
>
> > Per cache SQL parallelism level looks reasonable to me here.
> >
> > I'm not sure what do you mean about "prepared statement cache is useless
> > with splitted indices", most probably you parallelize queries in some
> wrong
> > way if this is true.
> >
> > Also do not forget about distributed joins: with parallel queries on the
> > same node we will need to make index range requests not only to remote
> > nodes, but to query contexts in parallel threads on the same local node
> as
> > well.
> >
> > Sergi
> >
> > 2016-11-30 17:23 GMT+03:00 Andrey Mashenkov <andrey.mashen...@gmail.com
> >:
> >
> > > It looks like we can't just split sql query to several threads due to
> H2
> > > limitations.
> > > We can bound query thread with certain set of partitions, but,
> actually,
> > H2
> > > will read whole index and then filter entries regarding its partition.
> > So,
> > > we can get significant speed-up that way.
> > >
> > > Unfortunatelly, H2 does not support sharding, and we need to have a
> > > workaround. We can try to split indices, so each query thread would be
> > > bounded with its own index part.
> > > I've implemented such prototype and get significant speed up with
> single
> > > node grid as if it was several node grid.
> > > Due to H2 knows nothing about splitted indices, we must bother about
> > every
> > > query should be run as TwoStepQuery and utilize all table index parts.
> > >
> > > As index creation on demand is very heavy operation, index should be
> > > splitted when it is created. So we can set parallelizm level on
> per-cache
> > > base but not per-query.
> > >
> > > Another issue I've faced is that our implementation of prepared
> statement
> > > cache is useless with splitted indices. Prepared statement cached  in
> > > thread local variable and it seems that the statement is bounded with
> > > certain index part. So if we reuse same statement for different index
> > parts
> > > we will get unexpected results.
> > >
> > > On Sun, Oct 30, 2016 at 8:46 PM, Dmitriy Setrakyan <
> > dsetrak...@apache.org>
> > > wrote:
> > >
> > > > Completely agree, great point!
> > > >
> > > > On Sun, Oct 30, 2016 at 9:17 AM, Sergi Vladykin <
> > > sergi.vlady...@gmail.com>
> > > > wrote:
> > > >
> > > > > I think it must be a maximum local parallelism level but not just
> > `on`
> > > > and
> > > > > `off` setting (the default is obviously 1). This along with
> > separately
> > > > > configurable query thread pool will give a finer grained control
> over
> > > > > resources.
> > > > >
> > > > > Sergi
> > > > >
> > > > > 2016-10-30 18:22 GMT+03:00 Dmitriy Setrakyan <
> dsetrak...@apache.org
> > >:
> > > > >
> > > > > > I already mentioned this in another email, but we should be able
> to
> > > > turn
> > > > > > this property on and off on per-query and per-cache levels.
> > > > > >
> > > > > > On Sat, Oct 29, 2016 at 11:45 AM, Sergi Vladykin <
> > > > > sergi.vlady...@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Agree, lets implement such a parallelization.
> > > > > > >
> > > > > > > I think we will need an explicit setting for SqlQuery and
> > > > > SqlFieldsQuery,
> > > > > > > the default behavior should not change.
> > > > > > >
> > > > > > > Sergi
> > > > > > >
> > > > > > > 2016-10-28 22:39 GMT+03:00 Andrey Mashenkov <
> > > amashen...@gridgain.com
> > > > >:
> > > > > > >
> > > > > > > > So, now we have every SQL query run on each node in single
> > > thread.
> > > > > This
> > > > > > > can
> > > > > > > > be an issue for heavy queries or queries running on big data
> > > sets,
> > > > > e.g.
> > > > > > > > analytical queries.
> > > > > > > >
> > > > > > > > For now, the only way to speed up such queries is to add more
> > > nodes
> > > > > to
> > > > > > > grid
> > > > > > > > running on same server. In this case, data will be
> partitioned
> > > over
> > > > > all
> > > > > > > > these nodes and query will be split and run on all nodes.
> > > > > > > >
> > > > > > > > It seems, we can have a benefit if split SQL queries locally
> as
> > > we
> > > > do
> > > > > > it
> > > > > > > > across nodes with TwoStepQuery.
> > > > > > > >
> > > > > > > >
> > > > > > > > Thoughts?
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > С уважением,
> > > Машенков Андрей Владимирович
> > > Тел. +7-921-932-61-82
> > >
> > > Best regards,
> > > Andrey V. Mashenkov
> > > Cerr: +7-921-932-61-82
> > >
> >
>
>
>
> --
> С уважением,
> Машенков Андрей Владимирович
> Тел. +7-921-932-61-82
>
> Best regards,
> Andrey V. Mashenkov
> Cerr: +7-921-932-61-82
>

Re: SQL query CPU utilization too low.

Reply via email to