Re: Question about the multi-thread scan node model

[email protected] Sun, 03 Sep 2017 20:17:02 -0700

Got it. Thanks Tim!


On 2017-09-01 00:53, Tim Armstrong <[email protected]> wrote: 
> I spoke to Alex Behm off-list about that JIRA a while ago. I don't think
> it's a true ramp-up task. The code change is easy but I think we would want
> to do performance validation and testing to make sure that the new
> multithreaded scanners have similar performance and stability before making
> them the default.
> 
> On Thu, Aug 31, 2017 at 12:34 AM, [email protected] <
> [email protected]> wrote:
> 
> > Yeah, "compute stats" is really cpu bound. That sounds great!
> >
> > I noticed that one of the sub tasks of multithreading work is labeled with
> > "ramp up": https://issues.apache.org/jira/browse/IMPALA-5802
> > Is this on progress? If not, could you reassign it to me to familiar with
> > the latest framework?
> >
> > Thanks,
> > Quanlong
> >
> > On 2017-08-31 07:16, Tim Armstrong <[email protected]> wrote:
> > > Hi,
> > >   The new scanner model is part of the multithreading work to support
> > > running multiple instances of each fragment on each Impala daemon. The
> > idea
> > > there is that parallelisation is done at the fragment level so that all
> > > execution including aggregations, sorts, joins is parallelised - not just
> > > scans. This is enabled by setting mt_dop > 0. Currently it doesn't work
> > for
> > > plans including joins and HDFS inserts.
> > >
> > > We find that a lot of queries are compute bound, particularly by
> > > aggregations and joins. In those cases we get big speedups from the newer
> > > multithreading model. E.g. "compute stats" is a lot faster.
> > >
> > > On Wed, Aug 30, 2017 at 3:50 PM, é»æé <[email protected]> 
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > >
> > > > Iâm working on applying our orc-support patch into the latest code
> > bases (
> > > > IMPALA-5717 <https://issues.apache.org/jira/browse/IMPALA-5717>).
> > Since
> > > > our
> > > > patch is based on cdh-5.7.3-release which was released one year ago,
> > > > thereâre lots of work to merge it.
> > > >
> > > >
> > > > One of the biggest changes from cdh-5.7.3-release I notice is the new
> > scan
> > > > node & scanner model introduced in IMPALA-3902
> > > > <https://issues.apache.org/jira/browse/IMPALA-3902>. I think itâs
> > inspired
> > > > by the investigating task in IMPALA-2849
> > > > <https://issues.apache.org/jira/browse/IMPALA-2849>, but I cannot
> > find any
> > > > performance report in this issue. Could you share some report about
> > this
> > > > multi-thread refactor?
> > > >
> > > >
> > > > Iâm wondering how much this can improve the performance, since the old
> > > > single thread scan node & multi-thread scanners model has supplied
> > > > concurrent IO for reading, and most of the queries in OLAP are IO
> > bound.
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Quanlong
> > > >
> > >
> >
>

Re: Question about the multi-thread scan node model

Reply via email to