GitHub user funes79 opened a pull request:
https://github.com/apache/incubator-impala/pull/1
Branch 2.8.0
I would like to register my first pull request for Impala. We are using it
in production almost 3 years.
I would like to suggest to improve the behaviour of compute incremental
stats.
We have a very very large table, initialy migrated from other cluster and
we had to create stats on the table. Compute incremental stats after 4 hours
failed (skipped), and in that time based on HDFS reads almost 90% of the table
was scanned. Unfortunately Impala didnt stored the partitions statisics (daily
paritions) so when I checked the stats there was everywhere false. And the
performance of the compute stats is very poor, it looks like it is scanning
partition by partition the tables, and if the partitons is small (on one node)
the other nodes are stayin idle.
Two improvements I would suggest:
- write the calculated stats immediatly after the partitions stats are
gathered
- if the table has large number of partitoons (3 years, 1000 partitons)
scan at least so many partions how many Impala Daemon are configured in
parallel.
Thanks
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/incubator-impala branch-2.8.0
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-impala/pull/1.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1
----
commit 2423d23f8a84f4b38d2250ae0598207aeda243b2
Author: Jim Apple <[email protected]>
Date: 2017-01-06T23:53:24Z
Update VERSION to begin release candidate testing
Change-Id: I0fcec577babba0929600d540936bb154a42dee50
commit 95e9479c12a3ba6fdfed25ae88467c8ba4622ad2
Author: Jim Apple <[email protected]>
Date: 2017-01-05T16:19:28Z
Add disclaimer to docs: Cloudera-specific info still present.
While we are working on excising it, we don't want users to be
confused about what the manual is intended to describe.
Change-Id: I7740189fd7ff7f22d8471f037e190d9923521936
Reviewed-on: http://gerrit.cloudera.org:8080/5610
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Impala Public Jenkins
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---