GitHub user funes79 opened a pull request:

    https://github.com/apache/incubator-impala/pull/1

    Branch 2.8.0

    I would like to register my first pull request for Impala. We are using it 
in production almost 3 years.
    I would like to suggest to improve the behaviour of compute incremental 
stats. 
    We have a very very large table, initialy migrated from other cluster and 
we had to create stats on the table. Compute incremental stats after 4 hours 
failed (skipped), and in that time based on HDFS reads almost 90% of the table 
was scanned. Unfortunately Impala didnt stored the partitions statisics (daily 
paritions) so when I checked the stats there was everywhere false. And the 
performance of the compute stats is very poor, it looks like it is scanning 
partition by partition the tables, and if the partitons is small (on one node) 
the other nodes are stayin idle.  
    Two improvements I would suggest:
     - write the calculated stats immediatly after the partitions stats are 
gathered
     - if the table has large number of partitoons (3 years, 1000 partitons) 
scan at least so many partions how many Impala Daemon are configured in 
parallel.
    
    Thanks

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-impala branch-2.8.0

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-impala/pull/1.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1
    
----
commit 2423d23f8a84f4b38d2250ae0598207aeda243b2
Author: Jim Apple <[email protected]>
Date:   2017-01-06T23:53:24Z

    Update VERSION to begin release candidate testing
    
    Change-Id: I0fcec577babba0929600d540936bb154a42dee50

commit 95e9479c12a3ba6fdfed25ae88467c8ba4622ad2
Author: Jim Apple <[email protected]>
Date:   2017-01-05T16:19:28Z

    Add disclaimer to docs: Cloudera-specific info still present.
    
    While we are working on excising it, we don't want users to be
    confused about what the manual is intended to describe.
    
    Change-Id: I7740189fd7ff7f22d8471f037e190d9923521936
    Reviewed-on: http://gerrit.cloudera.org:8080/5610
    Reviewed-by: Tim Armstrong <[email protected]>
    Tested-by: Impala Public Jenkins

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to