Hi +1 , through adding "DICTIONARY_EXCLUDE= 'ALL' and DICTIONARY_INCLUDE= 'ALL' " to improve the usability of DDL. This solution is more flexible than put no-dictionary as default.
Regards Liang 2017-02-27 20:27 GMT+08:00 Ravindra Pesala <ravi.pes...@gmail.com>: > Hi Bill, > > I got your point, but the solution of making no-dictionary as default may > not be perfect solution. Basically no-dictionary columns are only meant for > high cardinality dimensions, so the usage may change from user to user or > scenario to scenario . > This is the basic issue of usability of DDL, please first focus on to > simplify DDL usability. > > For example we have 6 columns , we can mention DDL as below. > case 1 : > SORT_COLUMNS="C1,C2,C3" > NON_SORT_COLUMNS="C4,C5,C6" > In above case C1, C2 , C3 are sort columns and part of MDK key. And > C4,C5,C6 are become non sort columns(measure/complex) > > DICTIONARY_EXCLUDE= 'ALL' > DICTIONARY_INCLUDE='C3' > In the above case all sort columns((C1,C2,C3) are non-dictionary columns > except C3, here C3 is dictionary column. > > case 2: > SORT_COLUMNS="ALL" > NON_SORT_COLUMNS="C6" > In this case all columns are sort columns except C6. > > DICTIONARY_EXCLUDE= 'C2' > DICTIONARY_INCLUDE='ALL' > In the above case all sort columns(C1,C2,C3,C4,C5) are dictionary columns > except C2, here C2 is no-dictionary column. > > Above mentioned are just my idea of how to simplify DDL to handle all > scenarios. We can have more discussion towards it to simplify the DDL. > > Regards, > Ravindra. > > On 27 February 2017 at 12:38, bill.zhou <zgcsk...@163.com> wrote: > > > Dear Vishal & Ravindra > > > > Thanks for you reply, I think I didn't describe it clearly so that you > > don't get full idea. > > 1. dictionary is important feature in CarbonData, for every new customer > we > > will introduce this feature to him. So for new customer will know it > > clearly, will set the dictionary column when create table. > > 2. For all customer like bank customer, telecom customer and traffic > > customer have a same scenario is: have more column but only set few > column > > as dictionary. > > like telecom customer, 300 column only set 5 column dictionary, other > > dim don't set dictionary. > > like bank customer, 100 column only set about 5 column dictionary, > > other > > dim don't set dictionary. > > *For currently customer actually user scenario, they only set the dim > which > > used for filter and group by related column as dictionary* > > 3. mys suggestion is that: dim column default as no dictionary is only > for > > the dim which not put into the dictionary_include properties, not for all > > dim column. If customer always used 5 columns add into dictionary_include > > and others column no dictionary, this will not impact the query > > performance. > > > > So that I suggestion the dim column default set as no dictionary which > not > > added in to dictionary_include properties. > > > > Regards > > Bill > > > > > > > > kumarvishal09 wrote > > > Hi, > > > I completely agree with Ravindra's points, more number of no > > > dictionary > > > column will impact the IO reading+writing both as in case of no > > dictionary > > > data size will increase. Late decoding is one of main advantage, no > > > dictionary column aggregation will be slower. Filter query will suffer > as > > > in case of dictionary column we are comparing on byte pack value, in > case > > > of no dictionary it will be on actual value. > > > > > > -Regards > > > Kumar Vishal > > > > > > On Mon, Feb 27, 2017 at 12:34 AM, Ravindra Pesala < > > > > > ravi.pesala@ > > > > > > > > > wrote: > > > > > >> Hi, > > >> > > >> I feel there are more disadvantages than advantages in this approach. > In > > >> your current scenario you want to set dictionary only for columns > which > > >> are > > >> used as filters, but the usage of dictionary is not only limited for > > >> filters, it can reduce the store size and improve the aggregation > > >> queries. > > >> I think you should set no_inverted_index false on non filtered columns > > to > > >> reduce the store size and improve the performance. > > >> > > >> If we make no dictionary as default then user no need set them in DDL > > but > > >> user needs to set the dictionary columns. If user wants to set more > > >> dictionary columns then the same problem what you mentioned arises > again > > >> so > > >> it does not solve the problem. I feel we should give more flexibility > in > > >> our DDL to simplify these scenarios and we should have more discussion > > on > > >> it. > > >> > > >> Pros & Cons of your suggestion. > > >> Advantages : > > >> 1. Decoding/Encoding of dictionary could be avoided. > > >> > > >> Disadvantages : > > >> 1. Store size will increase drastically. > > >> 2. IO will increase so query performance will come down. > > >> 3. Aggregation queries performance will suffer. > > >> > > >> > > >> > > >> Regards, > > >> Ravindra. > > >> > > >> On 26 February 2017 at 20:04, bill.zhou < > > > > > zgcsky08@ > > > > > > wrote: > > >> > > >> > hi All > > >> > Now when create the CarbonData table,if the dimension don't add > > >> into > > >> > the dictionary_exclude properties, the dimension will be consider as > > >> > dictionary default. I think default should be no dictionary. > > >> > > > >> > For example when I do the POC for one customer, it has 300 > columns > > >> and > > >> > 200 dimensions, but only 5 columns is used for filter, so he only > need > > >> set > > >> > this 5 columns to dictionary and leave other 195 columns to no > > >> dictionary. > > >> > But now he need specify for the 195 columns to dictionary_exclude > > >> > properties > > >> > the will waste time and make the create table command huge, also > will > > >> > impact > > >> > the load performance. > > >> > > > >> > So I suggestion dimension default should be no dictionary and > this > > >> can > > >> > also help customer easy to know the dictionary column which is > useful. > > >> > > > >> > > > >> > > > >> > -- > > >> > View this message in context: http://apache-carbondata- > > >> > mailing-list-archive.1130556.n5.nabble.com/DISCUSS-For-the- > > >> > dimension-default-should-be-no-dictionary-tp8010.html > > >> > Sent from the Apache CarbonData Mailing List archive mailing list > > >> archive > > >> > at Nabble.com. > > >> > > > >> > > >> > > >> > > >> -- > > >> Thanks & Regards, > > >> Ravi > > >> > > > > > > > > > > > > -- > > View this message in context: http://apache-carbondata- > > mailing-list-archive.1130556.n5.nabble.com/DISCUSS-For-the- > > dimension-default-should-be-no-dictionary-tp8010p8027.html > > Sent from the Apache CarbonData Mailing List archive mailing list archive > > at Nabble.com. > > > > > > -- > Thanks & Regards, > Ravi > -- Regards Liang