thank you seshu and luke @luke let's discuss on the details of Kylin tech blog series tomorrow
On Mon, Aug 3, 2015 at 10:22 PM, Adunuthula, Seshu <[email protected]> wrote: > Hongbin, > > Nice explanation. Would make a great blog article. > > Regards > Seshu > > On 8/2/15, 8:32 PM, "hongbin ma" <[email protected]> wrote: > > >hi alex, > > > >I'm not quite following this thread? It looks like except jason > >recommending your the slideshare link, no one is replying you. To whom are > >your communicating? (Is someone sending reply to you in person instead of > >to the dev list?) > > > >Can you re-summarize your problem again and I might be able to help you. > >I found you're confused by hierarchies and derived columns, here's some > >take-aways, I'm recently summarizing them to formal docs: > > > >*Hierarchies:* > > > >Theoretically for N dimensions you'll end up with 2^N dimension > >combinations. However for some group of dimensions there are no need to > >create so many combinations. For example, if you have three dimensions: > >continent, country, city (In hierarchies, the "bigger" dimension comes > >first). You will only need the following three combinations of group by > >when you do drill down analysis: > > > >group by continent > >group by continent, country > >group by continent, country, city > > > >In such cases the combination count is reduced from 2^3=8 to 3, which is a > >great optimization. The same goes for the YEAR,QUATER,MONTH,DATE case. > > > >If we Donate the hierarchy dimension as H1,H2,H3, typical scenarios would > >be: > > > >*A. Hierarchies on lookup table* > > > >Fact Table (joins) Lookup Table > >=================== ============= > >column1,column2,,,,,, FK PK,,H1,H2,H3,,,, > > > >B. Hierarchies on fact table > > > >Fact Table > >=========================== > >column1,column2,,,H1,H2,H3,,,,,,, > > > >There is a special case for scenario A, where PK on the lookup table is > >accidentally being part of the hierarchies. For example we have a calendar > >lookup table where cal_dt is the primary key: > > > >*A*. Hierarchies on lookup table over its primary key* > > > >Lookup Table(Calendar) > >============================================== > >cal_dt(PK), week_beg_dt, month_beg_dt, quarter_beg_dt,,, > > > >For cases like A* what you need is another optimization called "Derived > >Columns" > > > >*Derived Columns:* > > > >Derived column is used when one or more dimensions (They must be dimension > >on lookup table, these columns are called "Derived") can be deduced from > >another(Usually it is the corresponding FK, this is called the "host > >column") > > > >For example, suppose we have a lookup table where we join fact table and > >it > >with "where DimA = DimX". Notice in Kylin, if you choose FK into a > >dimension, the corresponding PK will be automatically querable, without > >any > >extra cost. The secret is that since FK and PK are always identical, Kylin > >can apply filters/groupby on the FK first, and transparently replace them > >to PK. This indicates that if we want the DimA(FK), DimX(PK), DimB, DimC > >in our cube, we can safely choose DimA,DimB,DimC only. > > > > > >Fact Table (joins) Lookup > >Table > >======================== ============= > >column1,column2,,,,,, DimA(FK) DimX(PK),,DimB, > >DimC > > > >Let's say that DimA(the dimension representing FK/PK) has a special > >mapping > >to DimB: > > > >dimA dimB dimC > >1 a ? > >2 b ? > >3 c ? > >4 a ? > > > >in this case, given a value in DimA, the value of DimB is determined, so > >we > >say dimB can be derived from DimA. When we build a cube that contains both > >DimA and DimB, we simple include DimA, and marking DimB as derived. > >Derived > >column(DimB) does not participant in cuboids generation: > > > >original combinations: > >ABC,AB,AC,BC,A,B,C > > > >combinations when driving B from A: > >AC,A,C > > > >at Runtime, in case queries like "select count(*) from fact_table inner > >join looup1 group by looup1 .dimB", it is expecting cuboid containing DimB > >to answer the query. However, DimB will appear in NONE of the cuboids due > >to derived optimization. In this case, we modify the execution plan to > >make > >it group by DimA(its host column) first, we'll get intermediate answer > >like: > > > >DimA count(*) > >1 1 > >2 1 > >3 1 > >4 1 > > > >Afterwards, Kylin will replace DimA values with DimB values(since both of > >their values are in lookup table, Kylin can load the whole lookup table > >into memory and build a mapping for them), and the intermediate result > >becomes: > > > >DimB count(*) > >a 1 > >b 1 > >c 1 > >a 1 > > > >After this, the runtime SQL engine(calcite) will further aggregate the > >intermediate result to: > > > >DimB count(*) > >a 2 > >b 1 > >c 1 > > > >this step happens at query runtime, this is what it means "at the cost of > >extra runtime aggregation" > > > > > > > > > >On Sat, Aug 1, 2015 at 12:54 AM, alex schufo <[email protected]> > wrote: > > > >> Sorry to be a bit annoying with the topic but I tried different cubes / > >> hierarchies and can never join. > >> > >> Without this basically I cannot use Kylin on PROD for my project. > >> > >> The stack trace: > >> > >> http-bio-7070-exec-3]:[2015-07-31 > >> > >> > >>09:42:06,337][ERROR][org.apache.kylin.rest.controller.BasicController.han > >>dleError(BasicController.java:52)] > >> - > >> > >> org.apache.kylin.rest.exception.InternalErrorException: Can't find any > >> realization. Please confirm with providers. SQL digest: fact table > >> DEFAULT.SAMPLE_DIM,group by [DEFAULT.SAMPLE_DIM.ID],filter on [],with > >> aggregates[]. > >> > >> while executing SQL: "select id from sample_dim group by id LIMIT 50000" > >> > >> at > >> > >> > >>org.apache.kylin.rest.controller.QueryController.doQueryInternal(QueryCon > >>troller.java:223) > >> > >> at > >> > >> > >>org.apache.kylin.rest.controller.QueryController.doQuery(QueryController. > >>java:174) > >> > >> at > >> > >> > >>org.apache.kylin.rest.controller.QueryController.query(QueryController.ja > >>va:91) > >> > >> at > >> > >> > >>org.apache.kylin.rest.controller.QueryController$$FastClassByCGLIB$$fc039 > >>d0b.invoke(<generated>) > >> > >> at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) > >> > >> at > >> > >> > >>org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.in > >>vokeJoinpoint(Cglib2AopProxy.java:689) > >> > >> at > >> > >> > >>org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(Refl > >>ectiveMethodInvocation.java:150) > >> > >> at > >> > >> > >>com.ryantenney.metrics.spring.TimedMethodInterceptor.invoke(TimedMethodIn > >>terceptor.java:48) > >> > >> at > >> > >> > >>com.ryantenney.metrics.spring.TimedMethodInterceptor.invoke(TimedMethodIn > >>terceptor.java:34) > >> > >> at > >> > >> > >>com.ryantenney.metrics.spring.AbstractMetricMethodInterceptor.invoke(Abst > >>ractMetricMethodInterceptor.java:59) > >> > >> at > >> > >> > >>org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(Refl > >>ectiveMethodInvocation.java:172) > >> > >> at > >> > >> > >>org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedIntercepto > >>r.intercept(Cglib2AopProxy.java:622) > >> > >> at > >> > >> > >>org.apache.kylin.rest.controller.QueryController$$EnhancerByCGLIB$$5b6079 > >>24.query(<generated>) > >> > >> at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown Source) > >> > >> at > >> > >> > >>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI > >>mpl.java:43) > >> > >> at java.lang.reflect.Method.invoke(Method.java:606) > >> > >> at > >> > >> > >>org.springframework.web.method.support.InvocableHandlerMethod.invoke(Invo > >>cableHandlerMethod.java:213) > >> > >> at > >> > >> > >>org.springframework.web.method.support.InvocableHandlerMethod.invokeForRe > >>quest(InvocableHandlerMethod.java:126) > >> > >> at > >> > >> > >>org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHan > >>dlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96) > >> > >> at > >> > >> > >>org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandl > >>erAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617) > >> > >> at > >> > >> > >>org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandl > >>erAdapter.handleInternal(RequestMappingHandlerAdapter.java:578) > >> > >> at > >> > >> > >>org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.h > >>andle(AbstractHandlerMethodAdapter.java:80) > >> > >> at > >> > >> > >>org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherSe > >>rvlet.java:923) > >> > >> at > >> > >> > >>org.springframework.web.servlet.DispatcherServlet.doService(DispatcherSer > >>vlet.java:852) > >> > >> at > >> > >> > >>org.springframework.web.servlet.FrameworkServlet.processRequest(Framework > >>Servlet.java:882) > >> > >> at > >> > >> > >>org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet. > >>java:789) > >> > >> at javax.servlet.http.HttpServlet.service(HttpServlet.java:646) > >> > >> at javax.servlet.http.HttpServlet.service(HttpServlet.java:727) > >> > >> at > >> > >> > >>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat > >>ionFilterChain.java:303) > >> > >> at > >> > >> > >>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte > >>rChain.java:208) > >> > >> at > >> org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) > >> > >> at > >> > >> > >>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat > >>ionFilterChain.java:241) > >> > >> at > >> > >> > >>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte > >>rChain.java:208) > >> > >> at > >> > >> > >>com.codahale.metrics.servlet.AbstractInstrumentedFilter.doFilter(Abstract > >>InstrumentedFilter.java:97) > >> > >> at > >> > >> > >>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat > >>ionFilterChain.java:241) > >> > >> at > >> > >> > >>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte > >>rChain.java:208) > >> > >> at > >> > >> > >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi > >>lter(FilterChainProxy.java:330) > >> > >> at > >> > >> > >>org.springframework.security.web.access.intercept.FilterSecurityIntercept > >>or.invoke(FilterSecurityInterceptor.java:118) > >> > >> at > >> > >> > >>org.springframework.security.web.access.intercept.FilterSecurityIntercept > >>or.doFilter(FilterSecurityInterceptor.java:84) > >> > >> at > >> > >> > >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi > >>lter(FilterChainProxy.java:342) > >> > >> at > >> > >> > >>org.springframework.security.web.access.ExceptionTranslationFilter.doFilt > >>er(ExceptionTranslationFilter.java:113) > >> > >> at > >> > >> > >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi > >>lter(FilterChainProxy.java:342) > >> > >> at > >> > >> > >>org.springframework.security.web.session.SessionManagementFilter.doFilter > >>(SessionManagementFilter.java:103) > >> > >> at > >> > >> > >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi > >>lter(FilterChainProxy.java:342) > >> > >> at > >> > >> > >>org.springframework.security.web.authentication.AnonymousAuthenticationFi > >>lter.doFilter(AnonymousAuthenticationFilter.java:113) > >> > >> at > >> > >> > >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi > >>lter(FilterChainProxy.java:342) > >> > >> at > >> > >> > >>org.springframework.security.web.servletapi.SecurityContextHolderAwareReq > >>uestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54) > >> > >> at > >> > >> > >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi > >>lter(FilterChainProxy.java:342) > >> > >> at > >> > >> > >>org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doF > >>ilter(RequestCacheAwareFilter.java:45) > >> > >> at > >> > >> > >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi > >>lter(FilterChainProxy.java:342) > >> > >> at > >> > >> > >>org.springframework.security.web.authentication.www.BasicAuthenticationFi > >>lter.doFilter(BasicAuthenticationFilter.java:150) > >> > >> at > >> > >> > >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi > >>lter(FilterChainProxy.java:342) > >> > >> at > >> > >> > >>org.springframework.security.web.authentication.ui.DefaultLoginPageGenera > >>tingFilter.doFilter(DefaultLoginPageGeneratingFilter.java:91) > >> > >> at > >> > >> > >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi > >>lter(FilterChainProxy.java:342) > >> > >> at > >> > >> > >>org.springframework.security.web.authentication.AbstractAuthenticationPro > >>cessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:183) > >> > >> at > >> > >> > >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi > >>lter(FilterChainProxy.java:342) > >> > >> at > >> > >> > >>org.springframework.security.web.authentication.logout.LogoutFilter.doFil > >>ter(LogoutFilter.java:105) > >> > >> at > >> > >> > >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi > >>lter(FilterChainProxy.java:342) > >> > >> at > >> > >> > >>org.springframework.security.web.context.SecurityContextPersistenceFilter > >>.doFilter(SecurityContextPersistenceFilter.java:87) > >> > >> at > >> > >> > >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi > >>lter(FilterChainProxy.java:342) > >> > >> at > >> > >> > >>org.springframework.security.web.FilterChainProxy.doFilterInternal(Filter > >>ChainProxy.java:192) > >> > >> at > >> > >> > >>org.springframework.security.web.FilterChainProxy.doFilter(FilterChainPro > >>xy.java:160) > >> > >> at > >> > >> > >>org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(Deleg > >>atingFilterProxy.java:346) > >> > >> at > >> > >> > >>org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingF > >>ilterProxy.java:259) > >> > >> at > >> > >> > >>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat > >>ionFilterChain.java:241) > >> > >> at > >> > >> > >>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte > >>rChain.java:208) > >> > >> at > >> > >> > >>org.apache.kylin.rest.filter.KylinApiFilter.doFilterInternal(KylinApiFilt > >>er.java:64) > >> > >> at > >> > >> > >>org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerReque > >>stFilter.java:76) > >> > >> at > >> > >> > >>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat > >>ionFilterChain.java:241) > >> > >> at > >> > >> > >>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte > >>rChain.java:208) > >> > >> at > >> com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:195) > >> > >> at > >> com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:266) > >> > >> at > >> > >> > >>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat > >>ionFilterChain.java:241) > >> > >> at > >> > >> > >>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte > >>rChain.java:208) > >> > >> at > >> > >> > >>org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve > >>.java:220) > >> > >> at > >> > >> > >>org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve > >>.java:122) > >> > >> at > >> > >> > >>org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorB > >>ase.java:504) > >> > >> at > >> > >> > >>org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java: > >>170) > >> > >> at > >> > >> > >>org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java: > >>103) > >> > >> at > >> > >>org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) > >> > >> at > >> > >> > >>org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.j > >>ava:116) > >> > >> at > >> > >>org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:42 > >>1) > >> > >> at > >> > >> > >>org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Pr > >>ocessor.java:1074) > >> > >> at > >> > >> > >>org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(Abst > >>ractProtocol.java:611) > >> > >> at > >> > >> > >>org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.ja > >>va:316) > >> > >> at > >> > >> > >>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java > >>:1145) > >> > >> at > >> > >> > >>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav > >>a:615) > >> > >> at > >> > >> > >>org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread > >>.java:61) > >> > >> at java.lang.Thread.run(Thread.java:744) > >> > >> > >> The result for "select * from sample_dim" > >> > >> ID,DIM1,DIM2 > >> > >> 33814,NYC,USA > >> > >> 201431,PARIS,FRANCE > >> > >> etc. > >> > >> > >> > >> On Wed, Jul 29, 2015 at 3:37 PM, alex schufo <[email protected]> > >>wrote: > >> > >> > So with 0.7.2 the cube builds, and I can see some improvement: > >> > > >> > "select * from SAMPLE_DIM" now returns all the fields, i.e: > >> > > >> > dim1, dim2, dim3, etc., SAMPLE_ID > >> > > >> > and I can see all the values for each field. > >> > > >> > However the join between the fact table and the lookup table still > >>does > >> > not work, it returns: > >> > > >> > Can't find any realization. > >> > > >> > And if I do "select SAMPLE_ID from SAMPLE_DIM group by SAMPLE_ID" it > >>also > >> > returns: > >> > > >> > Can't find any realization. > >> > > >> > If I do "select SAMPLE_ID from FACT_TABLE group by SAMPLE_ID" then I > >>get > >> > the list of all SAMPLE_ID as expected. > >> > > >> > If I do "select dim1 from SAMPLE_DIM group by dim1" I also get the > >>list > >> of > >> > all dim1 as expected. > >> > > >> > The same exact query works perfectly on Hive (although it takes a long > >> > time to be processed of course). > >> > > >> > Am I doing something wrong? > >> > > >> > On Wed, Jul 29, 2015 at 1:35 PM, alex schufo <[email protected]> > >> wrote: > >> > > >> >> Ok I guess this is https://issues.apache.org/jira/browse/KYLIN-831, > >> >> right? > >> >> > >> >> I upgraded today to 0.7.2 and hope it solves the problem then. > >> >> > >> >> Regards > >> >> > >> >> On Tue, Jul 28, 2015 at 5:52 PM, alex schufo <[email protected]> > >> >> wrote: > >> >> > >> >>> I still don't understand this. > >> >>> > >> >>> I have a simple fact table and a simple SAMPLE_DIM lookup table. > >>They > >> >>> are joined on SAMPLE_ID. > >> >>> > >> >>> If I do like you say and include all the columns of SAMPLE_DIM as a > >> >>> hierarchy and do not include the SAMPLE_ID then the cube builds > >> >>> successfully but I cannot query with the hierarchy. Any join > >>results in > >> >>> this error: > >> >>> > >> >>> Column 'SAMPLE_ID' not found in table 'SAMPLE_DIM' > >> >>> > >> >>> Indeed if I do a select * from 'SAMPLE_DIM' I can see all the > >>hierarchy > >> >>> but not the SAMPLE_ID used to join with the fact table. > >> >>> > >> >>> If I include the SAMPLE_ID in the hierarchy definition then the cube > >> >>> build fails on step 3 with: > >> >>> > >> >>> java.lang.NullPointerException: Column DEFAULT.FACT_TABLE.SAMPLE_ID > >> does > >> >>> not exist in row key desc > >> >>> at > >> org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158) > >> >>> at > >> >>> > >> > >>org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152) > >> >>> at > >> >>> > >> > >>org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:16 > >>3) > >> >>> at > >> >>> > >> > >>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionar > >>yGeneratorCLI.java:51) > >> >>> at > >> >>> > >> > >>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionar > >>yGeneratorCLI.java:42) > >> >>> at > >> >>> > >> > >>org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionary > >>Job.java:53) > >> >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > >> >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > >> >>> at > >> >>> > >> > >>org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecu > >>table.java:63) > >> >>> at > >> >>> > >> > >>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecuta > >>ble.java:107) > >> >>> at > >> >>> > >> > >>org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultCha > >>inedExecutable.java:50) > >> >>> at > >> >>> > >> > >>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecuta > >>ble.java:107) > >> >>> at > >> >>> > >> > >>org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Defau > >>ltScheduler.java:132) > >> >>> at > >> >>> > >> > >>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java > >>:1145) > >> >>> at > >> >>> > >> > >>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav > >>a:615) > >> >>> at java.lang.Thread.run(Thread.java:744) > >> >>> > >> >>> (the SAMPLE_ID *does* exist in the FACT_TABLE) > >> >>> > >> >>> The only scenario I could make it work is when I also create a > >>derived > >> >>> dimension SAMPLE_ID / something else, then somehow the SAMPLE_ID is > >> >>> included and can be queried. > >> >>> > >> >>> Any help with that? > >> >>> > >> >>> > >> >>> On Fri, Jun 19, 2015 at 1:37 PM, alex schufo <[email protected]> > >> >>> wrote: > >> >>> > >> >>>> Thanks for the answer, > >> >>>> > >> >>>> Indeed I had a look at these slides before and it's great to > >> understand > >> >>>> the high level concepts but I ended up spending quite some time > >>when > >> >>>> designing my dimensions with the issues mentioned below. > >> >>>> > >> >>>> On Fri, Jun 19, 2015 at 11:23 AM, jason zhong > >><[email protected] > >> > > >> >>>> wrote: > >> >>>> > >> >>>>> Hi Alex, > >> >>>>> > >> >>>>> We have a slide to hlep you understand how to build cube.I don't > >>know > >> >>>>> whether you have read this? This will hlep you understand derived > >>and > >> >>>>> hierarchy. > >> >>>>> > >> >>>>> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin > >> >>>>> > >> >>>>> for your case about hierarchy,log_date should not be included in > >> >>>>> hierarchy > >> >>>>> ,here's a bug you help find it.we will follow this. > >> >>>>> > >> >>>>> also .more document and UI enhancement will be done to help user > >> build > >> >>>>> cube > >> >>>>> easily. > >> >>>>> > >> >>>>> Thanks!! > >> >>>>> > >> >>>>> On Fri, Jun 12, 2015 at 5:07 PM, alex schufo > >><[email protected]> > >> >>>>> wrote: > >> >>>>> > >> >>>>> > I am trying to create a simple cube with a fact table and 3 > >> >>>>> dimensions. > >> >>>>> > > >> >>>>> > I have read the different slideshares and wiki pages, but I > >>found > >> >>>>> that the > >> >>>>> > documentation is not very specific on how to manage hierarchies. > >> >>>>> > > >> >>>>> > Let's take this simple example : > >> >>>>> > > >> >>>>> > Fact table: productID, storeID, logDate, numbOfSell, etc. > >> >>>>> > > >> >>>>> > Date lookup table : logDate, week, month, quarter, etc. > >> >>>>> > > >> >>>>> > I specified Left join on logDate, actually when I specify this I > >> >>>>> find it > >> >>>>> > not very clear which one is considered to be the Left table and > >> >>>>> which one > >> >>>>> > is considered to be the Right table. I assumed the Fact table > >>was > >> >>>>> the left > >> >>>>> > table and the Lookup table the right table, looking at it now I > >> >>>>> think that > >> >>>>> > might be a mistake (I am just interested in dates for which > >>there > >> are > >> >>>>> > results in the fact table). > >> >>>>> > > >> >>>>> > If I use the auto generator it creates a derived dimension, I > >>don't > >> >>>>> think > >> >>>>> > that's what I need. > >> >>>>> > > >> >>>>> > So I created a hierarchy, but again to me it's clearly indicated > >> if I > >> >>>>> > should create ["quarter", "month", "week", "log_date"] or > >> ["logDate", > >> >>>>> > "week", "month", "quarter"]? > >> >>>>> > > >> >>>>> > Also should I include log_date in the hierarchy? To me it was > >>more > >> >>>>> > intuitive not to include it because it's already the join, but > >>it > >> >>>>> created > >> >>>>> > the cube without it and I cannot query by date, it says that > >> >>>>> "log_date" is > >> >>>>> > not found in the date table (it is in the Hive table but not the > >> cube > >> >>>>> > built). If I include it in the hierarchy the cube build fails > >>with > >> >>>>> this > >> >>>>> > error : > >> >>>>> > > >> >>>>> > java.lang.NullPointerException: Column > >>DEFAULT.DATE_TABLE.LOG_DATE > >> >>>>> > does not exist in row key desc > >> >>>>> > at > >> >>>>> > > >> >>>>> > >> org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158) > >> >>>>> > at > >> >>>>> > > >> >>>>> > >> > >>org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152) > >> >>>>> > at > >> >>>>> > > >> >>>>> > >> > >>org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:16 > >>3) > >> >>>>> > at > >> >>>>> > > >> >>>>> > >> > >>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionar > >>yGeneratorCLI.java:51) > >> >>>>> > at > >> >>>>> > > >> >>>>> > >> > >>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionar > >>yGeneratorCLI.java:42) > >> >>>>> > at > >> >>>>> > > >> >>>>> > >> > >>org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionary > >>Job.java:53) > >> >>>>> > at > >> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > >> >>>>> > at > >> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > >> >>>>> > at > >> >>>>> > > >> >>>>> > >> > >>org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecu > >>table.java:63) > >> >>>>> > at > >> >>>>> > > >> >>>>> > >> > >>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecuta > >>ble.java:107) > >> >>>>> > at > >> >>>>> > > >> >>>>> > >> > >>org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultCha > >>inedExecutable.java:50) > >> >>>>> > at > >> >>>>> > > >> >>>>> > >> > >>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecuta > >>ble.java:107) > >> >>>>> > at > >> >>>>> > > >> >>>>> > >> > >>org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Defau > >>ltScheduler.java:132) > >> >>>>> > at > >> >>>>> > > >> >>>>> > >> > >>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java > >>:1145) > >> >>>>> > at > >> >>>>> > > >> >>>>> > >> > >>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav > >>a:615) > >> >>>>> > at java.lang.Thread.run(Thread.java:744) > >> >>>>> > > >> >>>>> > result code:2 > >> >>>>> > > >> >>>>> > > >> >>>>> > I think it might be useful to improve the documentation to > >>explain > >> >>>>> this > >> >>>>> > more clearly and not just the basic steps because building a > >>cube > >> >>>>> even on > >> >>>>> > short time ranges takes some time so learning by trial / error > >>is > >> >>>>> very time > >> >>>>> > consuming. > >> >>>>> > > >> >>>>> > Same thing for the derived dimensions, should I include > >>["storeID", > >> >>>>> > "storeName"] or just ["storeName"]? The second option seems to > >>work > >> >>>>> for me. > >> >>>>> > > >> >>>>> > Thanks > >> >>>>> > > >> >>>>> > >> >>>> > >> >>>> > >> >>> > >> >> > >> > > >> > > > > > > > >-- > >Regards, > > > >*Bin Mahone | 马洪宾* > >Apache Kylin: http://kylin.io > >Github: https://github.com/binmahone > > -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone
