Got it. Thanks Tim!
On 2017-09-01 00:53, Tim Armstrong <[email protected]> wrote: > I spoke to Alex Behm off-list about that JIRA a while ago. I don't think > it's a true ramp-up task. The code change is easy but I think we would want > to do performance validation and testing to make sure that the new > multithreaded scanners have similar performance and stability before making > them the default. > > On Thu, Aug 31, 2017 at 12:34 AM, [email protected] < > [email protected]> wrote: > > > Yeah, "compute stats" is really cpu bound. That sounds great! > > > > I noticed that one of the sub tasks of multithreading work is labeled with > > "ramp up": https://issues.apache.org/jira/browse/IMPALA-5802 > > Is this on progress? If not, could you reassign it to me to familiar with > > the latest framework? > > > > Thanks, > > Quanlong > > > > On 2017-08-31 07:16, Tim Armstrong <[email protected]> wrote: > > > Hi, > > > The new scanner model is part of the multithreading work to support > > > running multiple instances of each fragment on each Impala daemon. The > > idea > > > there is that parallelisation is done at the fragment level so that all > > > execution including aggregations, sorts, joins is parallelised - not just > > > scans. This is enabled by setting mt_dop > 0. Currently it doesn't work > > for > > > plans including joins and HDFS inserts. > > > > > > We find that a lot of queries are compute bound, particularly by > > > aggregations and joins. In those cases we get big speedups from the newer > > > multithreading model. E.g. "compute stats" is a lot faster. > > > > > > On Wed, Aug 30, 2017 at 3:50 PM, 黿é <[email protected]> > > > wrote: > > > > > > > Hi all, > > > > > > > > > > > > Iâm working on applying our orc-support patch into the latest code > > bases ( > > > > IMPALA-5717 <https://issues.apache.org/jira/browse/IMPALA-5717>). > > Since > > > > our > > > > patch is based on cdh-5.7.3-release which was released one year ago, > > > > thereâre lots of work to merge it. > > > > > > > > > > > > One of the biggest changes from cdh-5.7.3-release I notice is the new > > scan > > > > node & scanner model introduced in IMPALA-3902 > > > > <https://issues.apache.org/jira/browse/IMPALA-3902>. I think itâs > > inspired > > > > by the investigating task in IMPALA-2849 > > > > <https://issues.apache.org/jira/browse/IMPALA-2849>, but I cannot > > find any > > > > performance report in this issue. Could you share some report about > > this > > > > multi-thread refactor? > > > > > > > > > > > > Iâm wondering how much this can improve the performance, since the old > > > > single thread scan node & multi-thread scanners model has supplied > > > > concurrent IO for reading, and most of the queries in OLAP are IO > > bound. > > > > > > > > > > > > Thanks, > > > > > > > > Quanlong > > > > > > > > > >
