Re: trafodion query optimization

乔彦克 Mon, 19 Sep 2016 01:15:40 -0700

Thanks Eric and Qifan. I am sorry to reply after so long a time because I'm
on the Chinese mid-autumn festival holiday.
According to Qifan's advice, I upload a log which contains the DDL and the
query plan hope to get more advice.


and to Eric, I summit a jira about the block-encoding and the compression,
https://issues.apache.org/jira/browse/TRAFODION-2195, so I only use the
hbase compression.

Qifan Chen <[email protected]>于2016年9月12日周一 下午10:43写道：

> Hi Qiao,
>
> You can also send us the DDL and the query plan to help with the tuning.
>
> To generate a query plan, do the following from sqlci, and the plan is in
> text file mylog.
>
>
>    1. log mylog clear;
>    2. prepare xx from <your query>;
>    3. explain xx;
>    4. explain options 'f' xx;
>    5. exit;
>
>
> Thanks --Qifan
>
> On Mon, Sep 12, 2016 at 8:21 AM, Eric Owhadi <[email protected]>
> wrote:
>
>> Hello Qiao,
>> When you say whatever the table ddl it limits things a lot, as table ddl
>> will help define several things that will drastically improve the degree
>> of
>> parallelism and the table size.
>>
>> The DOP (degree of parallelism) of the scan operator is constrained by the
>> number of regions your table uses. So if you want to increase DOP, you
>> need
>> to partition your table using the syntax like:
>> create table customer_demographics_salt
>> (
>>  cd_demo_sk int not null
>>  , cd_gender char(1)
>>  , cd_marital_status char(1)
>>  , cd_education_status char(20)
>>  , cd_purchase_estimate int
>>  , cd_credit_rating char(10)
>>  , cd_dep_count int
>>  , cd_dep_employed_count int
>>  , cd_dep_college_count int
>>  , primary key (cd_demo_sk)
>> )
>> salt using 12 partitions
>> ATTRIBUTES ALIGNED FORMAT
>> HBASE_OPTIONS
>>   (
>>     DATA_BLOCK_ENCODING = 'FAST_DIFF',
>>     COMPRESSION = 'SNAPPY'
>>   );
>>
>> you can experiment with different values of number of partitions (but
>> pick a
>> multiple of 3 since you have 3 nodes).
>>
>> Then the optimizer will pick the DOP with a compromise of resource usage
>> vs
>> gain in speed.
>> If you want to force higher DOP than what optimizer selected, you can use
>> :
>> CQD parallel_num_esps '12';
>> To force it to one ESP per partition (assuming you picked 12 partitions).
>> You can verify what optimizer picked as DOP by doing an explain on the
>> query.
>>
>> Other important factors plays in performance:
>> - use of aligned format (see above example)
>> - careful choice of the primary key (bad idea to use a varchar with big
>> max
>> size)
>> - It is good idea to use compression and encoding (see the most common
>> options we use above)
>> - you can also increase the HBASE_OPTIONS BLOCKSIZE parameter, but there
>> is
>> a drawback: increasing it will increase performance of SCAN but decrease
>> performance of keyed access. That is why I did not include it in the
>> example
>> above.
>>
>> Hope this helps,
>> Regards,
>> Eric
>>
>>
>> -----Original Message-----
>> From: 乔彦克 [mailto:[email protected]]
>> Sent: Monday, September 12, 2016 1:22 AM
>> To: [email protected];
>> [email protected]
>> Subject: trafodion query optimization
>>
>> Hi all,
>>      I executed the sum and count query on my table where the cluster has
>> three nodes. I found that the sum query is not well parallel executed(not
>> all the three nodes get high load when executing the sum query) and the
>> cpu
>> load is very high while the memory load is very low(the machines have 16
>> cores and 16GB memory). My sum query on the 12 million data sets takes
>> about
>> 2 minutes and a half time.
>>     So my question is that is there any optimization advice that I can use
>> to improve the query performance and maximize the usage of my machines,
>> what
>> ever the configuration or the table ddl.
>>     Any replies is appreciated.
>>
>> Thanks,
>> Qiao
>>
>
>
>
> --
> Regards, --Qifan
>
>

Re: trafodion query optimization

Reply via email to