Thank you -- very helpful. Regarding limits on the number of dimensions. What are the compute/storage constraints on this? For a given query: * Where is the data stored * Which nodes is the computation occurring on?
I am trying to figure out -- if we have a large number of dimensions, what part of the cloud based kylin needs to be increased (I'm doing the setup from the kylin4_on_cloud branch) Thanks, WILL On Tue, Oct 11, 2022 at 1:20 AM Xiaoxiang Yu <x...@apache.org> wrote: > 1) The criteria for filtering (e.g. selecting sex='male') and grouping (e.g. > group by state) should be dimensions - is this correct? > Yes, besides Kylin has limit of 63 dimensions at maximum. But you should > be aware of 'The Curse of Dimensionality'. > > 2.1) Items that I would like to sum should be measures, is that right? > Yes. > > 2.2) Is there a limit to the number of measures? > No, there isn't such limit. > > 3) Did Kylin support sum(expression)? > From mysql doc > https://dev.mysql.com/doc/refman/5.7/en/aggregate-functions.html#function_sum > , > we know MySQL supports it. > For Kylin, Kylin should support it for Kylin 3.X and the future version > 5.x. But unluckily, Kylin 4.x didn't support sum exprssion, and Kylin 4.x > is the version you are using. > > 4) Does Kylin support MEDIAN? > > Yes, Kylin should support but I didn't test it. In fact, Kylin has a > measure PERCENTILE, and I think 50th percentile is equal to MEDIAN, am I > right? > > -- > *Best wishes to you ! * > *From :**Xiaoxiang Yu* > > > > At 2022-10-11 14:03:14, "Will Glass-Husain" <wgl...@forio.com> wrote: > >Hi, > > > >Thanks for the recent help as I set up my first Kylin system. I have a > >question regarding proper design of a cube to run some > >demographic queries. I want to make this accessible in a webapp, with > >reasonable response time. > > > >I have a CSV file with about 80 columns on sex, race, state, age, internet > >access, job, etc. > > > >Can you advise regarding proper cube design? > > > >1) The criteria for filtering (e.g. selecting sex='male') and grouping > >(e.g. group by state) should be dimensions - is this correct? > > > >2) Items that I would like to sum should be measures, is that right? Is > >there a limit to the number of measures? I want to report out up to 300 > >different measures aggregated by the dimensions. > > > >3) > >In MySQL, I am querying for different values like this > > > >select SUM((married=1) * weight) as MARRIED_1, SUM((married=2) * weight) as > >MARRIED_2 from data group by state; > > > >This returns the total number of weighted records for records where married > >is 1 and where married is 2. > > > >Question - is there a way to do this in the Kylin query? Or do I need to > >pre-compute my weights and create columns MARRIED_1 and MARRIED_2 in the > >source data, then sum it in Kylin. > > > >4) This is a tricky one. Does Kylin support MEDIAN? In MySQL, there's no > >MEDIAN function but we can calculate it by counting all the records, then > >selecting the record at an offset of half the records. I want to > >calculate "median" (not mean) for age and some other variables. > > > >Thanks for any tips. > > > >Best, WILL > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >-- > >William Glass-Husain /forio | +1 (415) 440 7500 x802 | forio.com > ><http://www.forio.com/> > > -- William Glass-Husain /forio | +1 (415) 440 7500 x802 | forio.com <http://www.forio.com/>