Hi Jeszy: Thanks for your reply. On another cluster with two instances, I do the same SQL, and the file size is smaller :
F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2 WRITE TO HDFS [default.cdr_partition_par_false, OVERWRITE=true] | partitions=1 | mem-estimate=1.00GB mem-reservation=0B | 00:SCAN HDFS [default.cdr_partition, RANDOM] partitions=1/1 files=1 size=762.93MB And the single file is splitted: Averaged Fragment F00 <http://192.168.33.22:7180/cmf/impala/queryDetails?queryId=cb433d9e02457f39%3A247dc1f100000000&serviceName=impala#> - split sizes: *min: 378.93 MB, max: 384.00 MB, avg: 381.46 MB, stddev: 2.54 MB* Is there some configuration wrong in my cluster? 2017-08-03 13:20 GMT+08:00 Jeszy <[email protected]>: > Putting some more files in the source table will allow you to use more > hosts. > > On 3 August 2017 at 05:08, Taras Bobrovytsky <[email protected]> wrote: > > Yes, it looks like all the work is being done on a single node because > > hosts=1. > > > > On Wed, Aug 2, 2017 at 7:55 PM, 孙清孟 <[email protected]> wrote: > > > >> This is my impala cluster: > >> > >> > >> <http://192.168.200.101:7180/cmf/services/14/instances#sort> > >> Role Type <http://192.168.200.101:7180/cmf/services/14/instances#sort> > >> State <http://192.168.200.101:7180/cmf/services/14/instances#sort> > >> Host <http://192.168.200.101:7180/cmf/services/14/instances#sort> > >> Commission State > >> <http://192.168.200.101:7180/cmf/services/14/instances#sort> > >> Role Group <http://192.168.200.101:7180/cmf/services/14/instances#sort> > >> Impala Catalog Server > >> <http://192.168.200.101:7180/cmf/services/14/instances/48/status> > Started > >> with Outdated Configuration cdha0.embed.com > >> <http://192.168.200.101:7180/cmf/hardware/hosts/1/status> Commissioned > >> Impala > >> Catalog Server Default Group > >> Impala Daemon > >> <http://192.168.200.101:7180/cmf/services/14/instances/50/status> > Started > >> cdha2.embed.com <http://192.168.200.101:7180/ > cmf/hardware/hosts/3/status> > >> Commissioned Impala Daemon Default Group > >> Impala Daemon > >> <http://192.168.200.101:7180/cmf/services/14/instances/52/status> > Started > >> cdha1.embed.com <http://192.168.200.101:7180/ > cmf/hardware/hosts/2/status> > >> Commissioned Impala Daemon Default Group > >> Impala Daemon > >> <http://192.168.200.101:7180/cmf/services/14/instances/49/status> > Started > >> with Outdated Configuration cdha3.embed.com > >> <http://192.168.200.101:7180/cmf/hardware/hosts/5/status> Commissioned > >> Impala > >> Daemon Default Group > >> Impala Daemon > >> <http://192.168.200.101:7180/cmf/services/14/instances/51/status> > Started > >> cdha4.embed.com <http://192.168.200.101:7180/ > cmf/hardware/hosts/4/status> > >> Commissioned Impala Daemon Default Group > >> Impala StateStore > >> <http://192.168.200.101:7180/cmf/services/14/instances/53/status> > Started > >> cdha0.embed.com <http://192.168.200.101:7180/ > cmf/hardware/hosts/1/status> > >> Commissioned Impala StateStore Default Group > >> > >> > >> When I run a SQL: > >> > >> insert into table cdr_partition_true partition(ym = '2014-11') select > >> call_1, > >> call_2, > >> type_1, > >> own_1, > >> own_2, > >> hdfs_id, > >> a_imsi, > >> p_imsi, > >> a_imei, > >> p_imei, > >> CAST(unix_timestamp(start_time) AS INT), > >> CAST(unix_timestamp(end_time) AS INT), > >> time, > >> a_LAC, > >> a_CI, > >> p_LAC, > >> p_CIfrom cdr_partition_cwang > >> > >> > >> > >> The EXPLAIN, it says only one host: > >> > >> ---------------- > >> Estimated Per-Host Requirements: Memory=2.80GB VCores=1 > >> WARNING: The following tables are missing relevant table and/or column > >> statistics. > >> default.cdr_partition_cwang > >> > >> WRITE TO HDFS [default.cdr_partition_true, OVERWRITE=false, > >> PARTITION-KEYS=('2014-11')] > >> | partitions=1 > >> | hosts=1 per-host-mem=1.00GB > >> | > >> 00:SCAN HDFS [default.cdr_partition_cwang, RANDOM] > >> partitions=1/1 files=1 size=2.00GB > >> table stats: unavailable > >> column stats: unavailable > >> hosts=1 per-host-mem=1.80GB > >> tuple-ids=0 row-size=128B cardinality=unavailable > >> ---------------- > >> > >> And instance is 1 -> Average Fragment F00.num instances: 1 > >> > >> Is this means my work only was performed on only one impala node? > >> >
