Hi Bill,

I got your point, but the solution of making no-dictionary as default may
not be perfect solution. Basically no-dictionary columns are only meant for
high cardinality dimensions, so the usage may change from user to user or
scenario to scenario .
This is the basic issue of usability of DDL, please first focus on to
simplify DDL usability.

For example we have 6 columns , we can mention DDL as below.
case 1 :
SORT_COLUMNS="C1,C2,C3"
NON_SORT_COLUMNS="C4,C5,C6"
In above case C1, C2 , C3 are sort columns and part of MDK key. And
C4,C5,C6 are become non sort columns(measure/complex)

DICTIONARY_EXCLUDE= 'ALL'
DICTIONARY_INCLUDE='C3'
In the above case all sort columns((C1,C2,C3) are non-dictionary columns
except C3, here C3 is dictionary column.

case 2:
SORT_COLUMNS="ALL"
NON_SORT_COLUMNS="C6"
In this case all columns are sort columns except C6.

DICTIONARY_EXCLUDE= 'C2'
DICTIONARY_INCLUDE='ALL'
In the above case all sort columns(C1,C2,C3,C4,C5) are dictionary columns
except C2, here C2 is no-dictionary column.

Above mentioned are just my idea of how to simplify DDL to handle all
scenarios. We can have more discussion towards it to simplify the DDL.

Regards,
Ravindra.

On 27 February 2017 at 12:38, bill.zhou <zgcsk...@163.com> wrote:

> Dear Vishal & Ravindra
>
>   Thanks for you reply,  I think I didn't describe it clearly so that you
> don't get full idea.
> 1. dictionary is important feature in CarbonData, for every new customer we
> will introduce this feature to him. So for new customer will know it
> clearly, will set the dictionary column when create table.
> 2. For all customer like bank customer, telecom customer and traffic
> customer have a same scenario is: have more column but only set few column
> as dictionary.
>     like telecom customer, 300 column only set 5 column dictionary, other
> dim don't set dictionary.
>     like bank customer, 100 column only set about 5 column dictionary,
> other
> dim don't set dictionary.
> *For currently customer actually user scenario, they only set the dim which
> used for filter and group by related column as dictionary*
> 3. mys suggestion is that: dim column default as no dictionary is only for
> the dim which not put into the dictionary_include properties, not for all
> dim column. If customer always used 5 columns add into dictionary_include
> and others column no dictionary, this will not impact the query
> performance.
>
> So that I suggestion the dim column default set as no dictionary which not
> added in to dictionary_include properties.
>
> Regards
> Bill
>
>
>
> kumarvishal09 wrote
> > Hi,
> >     I completely agree with Ravindra's points, more number of no
> > dictionary
> > column will impact the IO reading+writing both as in case of no
> dictionary
> > data size will increase. Late decoding is one of main advantage, no
> > dictionary column aggregation will be slower. Filter query will suffer as
> > in case of dictionary column we are comparing on byte pack value, in case
> > of no dictionary it will be on actual value.
> >
> > -Regards
> > Kumar Vishal
> >
> > On Mon, Feb 27, 2017 at 12:34 AM, Ravindra Pesala &lt;
>
> > ravi.pesala@
>
> > &gt;
> > wrote:
> >
> >> Hi,
> >>
> >> I feel there are more disadvantages than advantages in this approach. In
> >> your current scenario you want to set dictionary only for columns which
> >> are
> >> used as filters, but the usage of dictionary is not only limited for
> >> filters, it can reduce the store size and improve the aggregation
> >> queries.
> >> I think you should set no_inverted_index false on non filtered columns
> to
> >> reduce the store size and improve the performance.
> >>
> >> If we make no dictionary as default then user no need set them in DDL
> but
> >> user needs to set the dictionary columns. If user wants to set more
> >> dictionary columns then the same problem what you mentioned arises again
> >> so
> >> it does not solve the problem. I feel we should give more flexibility in
> >> our DDL to simplify these scenarios and we should have more discussion
> on
> >> it.
> >>
> >> Pros & Cons of your suggestion.
> >> Advantages :
> >> 1. Decoding/Encoding of dictionary could be avoided.
> >>
> >> Disadvantages :
> >> 1. Store size will increase drastically.
> >> 2. IO will increase so query performance will come down.
> >> 3. Aggregation queries performance will suffer.
> >>
> >>
> >>
> >> Regards,
> >> Ravindra.
> >>
> >> On 26 February 2017 at 20:04, bill.zhou &lt;
>
> > zgcsky08@
>
> > &gt; wrote:
> >>
> >> > hi All
> >> >     Now when create the CarbonData table,if  the dimension don't add
> >> into
> >> > the dictionary_exclude properties, the dimension will be consider as
> >> > dictionary default. I think default should be no dictionary.
> >> >
> >> >     For example when I do the POC for one customer, it has 300 columns
> >> and
> >> > 200 dimensions, but only 5 columns is used for filter, so he only need
> >> set
> >> > this 5 columns to dictionary and leave other 195 columns to no
> >> dictionary.
> >> > But now he need specify for the 195 columns to dictionary_exclude
> >> > properties
> >> > the will waste time and make the create table command huge, also will
> >> > impact
> >> > the load performance.
> >> >
> >> >     So I suggestion dimension default should be no dictionary and this
> >> can
> >> > also help customer easy to know the dictionary column which is useful.
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context: http://apache-carbondata-
> >> > mailing-list-archive.1130556.n5.nabble.com/DISCUSS-For-the-
> >> > dimension-default-should-be-no-dictionary-tp8010.html
> >> > Sent from the Apache CarbonData Mailing List archive mailing list
> >> archive
> >> > at Nabble.com.
> >> >
> >>
> >>
> >>
> >> --
> >> Thanks & Regards,
> >> Ravi
> >>
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/DISCUSS-For-the-
> dimension-default-should-be-no-dictionary-tp8010p8027.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>



-- 
Thanks & Regards,
Ravi

Reply via email to