Hi Qiao,

You can also send us the DDL and the query plan to help with the tuning.

To generate a query plan, do the following from sqlci, and the plan is in
text file mylog.


   1. log mylog clear;
   2. prepare xx from <your query>;
   3. explain xx;
   4. explain options 'f' xx;
   5. exit;


Thanks --Qifan

On Mon, Sep 12, 2016 at 8:21 AM, Eric Owhadi <[email protected]> wrote:

> Hello Qiao,
> When you say whatever the table ddl it limits things a lot, as table ddl
> will help define several things that will drastically improve the degree of
> parallelism and the table size.
>
> The DOP (degree of parallelism) of the scan operator is constrained by the
> number of regions your table uses. So if you want to increase DOP, you need
> to partition your table using the syntax like:
> create table customer_demographics_salt
> (
>  cd_demo_sk int not null
>  , cd_gender char(1)
>  , cd_marital_status char(1)
>  , cd_education_status char(20)
>  , cd_purchase_estimate int
>  , cd_credit_rating char(10)
>  , cd_dep_count int
>  , cd_dep_employed_count int
>  , cd_dep_college_count int
>  , primary key (cd_demo_sk)
> )
> salt using 12 partitions
> ATTRIBUTES ALIGNED FORMAT
> HBASE_OPTIONS
>   (
>     DATA_BLOCK_ENCODING = 'FAST_DIFF',
>     COMPRESSION = 'SNAPPY'
>   );
>
> you can experiment with different values of number of partitions (but pick
> a
> multiple of 3 since you have 3 nodes).
>
> Then the optimizer will pick the DOP with a compromise of resource usage vs
> gain in speed.
> If you want to force higher DOP than what optimizer selected, you can use :
> CQD parallel_num_esps '12';
> To force it to one ESP per partition (assuming you picked 12 partitions).
> You can verify what optimizer picked as DOP by doing an explain on the
> query.
>
> Other important factors plays in performance:
> - use of aligned format (see above example)
> - careful choice of the primary key (bad idea to use a varchar with big max
> size)
> - It is good idea to use compression and encoding (see the most common
> options we use above)
> - you can also increase the HBASE_OPTIONS BLOCKSIZE parameter, but there is
> a drawback: increasing it will increase performance of SCAN but decrease
> performance of keyed access. That is why I did not include it in the
> example
> above.
>
> Hope this helps,
> Regards,
> Eric
>
>
> -----Original Message-----
> From: 乔彦克 [mailto:[email protected]]
> Sent: Monday, September 12, 2016 1:22 AM
> To: [email protected]; [email protected].
> apache.org
> Subject: trafodion query optimization
>
> Hi all,
>      I executed the sum and count query on my table where the cluster has
> three nodes. I found that the sum query is not well parallel executed(not
> all the three nodes get high load when executing the sum query) and the cpu
> load is very high while the memory load is very low(the machines have 16
> cores and 16GB memory). My sum query on the 12 million data sets takes
> about
> 2 minutes and a half time.
>     So my question is that is there any optimization advice that I can use
> to improve the query performance and maximize the usage of my machines,
> what
> ever the configuration or the table ddl.
>     Any replies is appreciated.
>
> Thanks,
> Qiao
>



-- 
Regards, --Qifan

Reply via email to