Hi Qiao, You can also send us the DDL and the query plan to help with the tuning.
To generate a query plan, do the following from sqlci, and the plan is in text file mylog. 1. log mylog clear; 2. prepare xx from <your query>; 3. explain xx; 4. explain options 'f' xx; 5. exit; Thanks --Qifan On Mon, Sep 12, 2016 at 8:21 AM, Eric Owhadi <[email protected]> wrote: > Hello Qiao, > When you say whatever the table ddl it limits things a lot, as table ddl > will help define several things that will drastically improve the degree of > parallelism and the table size. > > The DOP (degree of parallelism) of the scan operator is constrained by the > number of regions your table uses. So if you want to increase DOP, you need > to partition your table using the syntax like: > create table customer_demographics_salt > ( > cd_demo_sk int not null > , cd_gender char(1) > , cd_marital_status char(1) > , cd_education_status char(20) > , cd_purchase_estimate int > , cd_credit_rating char(10) > , cd_dep_count int > , cd_dep_employed_count int > , cd_dep_college_count int > , primary key (cd_demo_sk) > ) > salt using 12 partitions > ATTRIBUTES ALIGNED FORMAT > HBASE_OPTIONS > ( > DATA_BLOCK_ENCODING = 'FAST_DIFF', > COMPRESSION = 'SNAPPY' > ); > > you can experiment with different values of number of partitions (but pick > a > multiple of 3 since you have 3 nodes). > > Then the optimizer will pick the DOP with a compromise of resource usage vs > gain in speed. > If you want to force higher DOP than what optimizer selected, you can use : > CQD parallel_num_esps '12'; > To force it to one ESP per partition (assuming you picked 12 partitions). > You can verify what optimizer picked as DOP by doing an explain on the > query. > > Other important factors plays in performance: > - use of aligned format (see above example) > - careful choice of the primary key (bad idea to use a varchar with big max > size) > - It is good idea to use compression and encoding (see the most common > options we use above) > - you can also increase the HBASE_OPTIONS BLOCKSIZE parameter, but there is > a drawback: increasing it will increase performance of SCAN but decrease > performance of keyed access. That is why I did not include it in the > example > above. > > Hope this helps, > Regards, > Eric > > > -----Original Message----- > From: 乔彦克 [mailto:[email protected]] > Sent: Monday, September 12, 2016 1:22 AM > To: [email protected]; [email protected]. > apache.org > Subject: trafodion query optimization > > Hi all, > I executed the sum and count query on my table where the cluster has > three nodes. I found that the sum query is not well parallel executed(not > all the three nodes get high load when executing the sum query) and the cpu > load is very high while the memory load is very low(the machines have 16 > cores and 16GB memory). My sum query on the 12 million data sets takes > about > 2 minutes and a half time. > So my question is that is there any optimization advice that I can use > to improve the query performance and maximize the usage of my machines, > what > ever the configuration or the table ddl. > Any replies is appreciated. > > Thanks, > Qiao > -- Regards, --Qifan
