thank you seshu and luke

@luke let's discuss on the details of Kylin tech blog series tomorrow

On Mon, Aug 3, 2015 at 10:22 PM, Adunuthula, Seshu <[email protected]>
wrote:

> Hongbin,
>
> Nice explanation. Would make a great blog article.
>
> Regards
> Seshu
>
> On 8/2/15, 8:32 PM, "hongbin ma" <[email protected]> wrote:
>
> >hi alex,
> >
> >I'm not quite following this thread? It looks like except jason
> >recommending your the slideshare link, no one is replying you. To whom are
> >your communicating? (Is someone sending reply to you in person instead of
> >to the dev list?)
> >
> >Can you re-summarize your problem again and I might be able to help you.
> >I found you're confused by hierarchies and derived columns, here's some
> >take-aways, I'm recently summarizing them to formal docs:
> >
> >*Hierarchies:*
> >
> >Theoretically for N dimensions you'll end up with 2^N dimension
> >combinations. However for some group of dimensions there are no need to
> >create so many combinations. For example, if you have three dimensions:
> >continent, country, city (In hierarchies, the "bigger" dimension comes
> >first). You will only need the following three combinations of group by
> >when you do drill down analysis:
> >
> >group by continent
> >group by continent, country
> >group by continent, country, city
> >
> >In such cases the combination count is reduced from 2^3=8 to 3, which is a
> >great optimization. The same goes for the YEAR,QUATER,MONTH,DATE case.
> >
> >If we Donate the hierarchy dimension as H1,H2,H3, typical scenarios would
> >be:
> >
> >*A. Hierarchies on lookup table*
> >
> >Fact  Table                            (joins)         Lookup Table
> >===================                         =============
> >column1,column2,,,,,, FK                         PK,,H1,H2,H3,,,,
> >
> >B. Hierarchies on fact table
> >
> >Fact  Table
> >===========================
> >column1,column2,,,H1,H2,H3,,,,,,,
> >
> >There is a special case for scenario A, where PK on the lookup table is
> >accidentally being part of the hierarchies. For example we have a calendar
> >lookup table where cal_dt is the primary key:
> >
> >*A*. Hierarchies on lookup table over its primary key*
> >
> >Lookup Table(Calendar)
> >==============================================
> >cal_dt(PK), week_beg_dt, month_beg_dt, quarter_beg_dt,,,
> >
> >For cases like A* what you need is another optimization called "Derived
> >Columns"
> >
> >*Derived Columns:*
> >
> >Derived column is used when one or more dimensions (They must be dimension
> >on lookup table, these columns are called "Derived") can be deduced from
> >another(Usually it is the corresponding FK, this is called the "host
> >column")
> >
> >For example, suppose we have a lookup table where we join fact table and
> >it
> >with "where DimA = DimX". Notice in Kylin, if you choose FK into a
> >dimension, the corresponding PK will be automatically querable, without
> >any
> >extra cost. The secret is that since FK and PK are always identical, Kylin
> >can apply filters/groupby on the FK first, and transparently replace them
> >to PK.  This indicates that if we want the DimA(FK), DimX(PK), DimB, DimC
> >in our cube, we can safely choose DimA,DimB,DimC only.
> >
> >
> >Fact  Table                                     (joins)           Lookup
> >Table
> >========================                         =============
> >column1,column2,,,,,, DimA(FK)                         DimX(PK),,DimB,
> >DimC
> >
> >Let's say that DimA(the dimension representing FK/PK) has a special
> >mapping
> >to DimB:
> >
> >dimA    dimB  dimC
> >1           a        ?
> >2           b        ?
> >3           c        ?
> >4           a        ?
> >
> >in this case, given a value in DimA, the value of DimB is determined, so
> >we
> >say dimB can be derived from DimA. When we build a cube that contains both
> >DimA and DimB, we simple include DimA, and marking DimB as derived.
> >Derived
> >column(DimB) does not participant in cuboids generation:
> >
> >original combinations:
> >ABC,AB,AC,BC,A,B,C
> >
> >combinations when driving B from A:
> >AC,A,C
> >
> >at Runtime, in case queries like "select count(*) from fact_table inner
> >join looup1 group by looup1 .dimB", it is expecting cuboid containing DimB
> >to answer the query. However, DimB will appear in NONE of the cuboids due
> >to derived optimization. In this case, we modify the execution plan to
> >make
> >it group by  DimA(its host column) first, we'll get intermediate answer
> >like:
> >
> >DimA  count(*)
> >1          1
> >2          1
> >3          1
> >4          1
> >
> >Afterwards, Kylin will replace DimA values with DimB values(since both of
> >their values are in lookup table, Kylin can load the whole lookup table
> >into memory and build a mapping for them), and the intermediate result
> >becomes:
> >
> >DimB  count(*)
> >a          1
> >b          1
> >c          1
> >a          1
> >
> >After this, the runtime SQL engine(calcite) will further aggregate the
> >intermediate result to:
> >
> >DimB  count(*)
> >a          2
> >b          1
> >c          1
> >
> >this step happens at query runtime, this is what it means "at the cost of
> >extra runtime aggregation"
> >
> >
> >
> >
> >On Sat, Aug 1, 2015 at 12:54 AM, alex schufo <[email protected]>
> wrote:
> >
> >> Sorry to be a bit annoying with the topic but I tried different cubes /
> >> hierarchies and can never join.
> >>
> >> Without this basically I cannot use Kylin on PROD for my project.
> >>
> >> The stack trace:
> >>
> >> http-bio-7070-exec-3]:[2015-07-31
> >>
> >>
> >>09:42:06,337][ERROR][org.apache.kylin.rest.controller.BasicController.han
> >>dleError(BasicController.java:52)]
> >> -
> >>
> >> org.apache.kylin.rest.exception.InternalErrorException: Can't find any
> >> realization. Please confirm with providers. SQL digest: fact table
> >> DEFAULT.SAMPLE_DIM,group by [DEFAULT.SAMPLE_DIM.ID],filter on [],with
> >> aggregates[].
> >>
> >> while executing SQL: "select id from sample_dim group by id LIMIT 50000"
> >>
> >>         at
> >>
> >>
> >>org.apache.kylin.rest.controller.QueryController.doQueryInternal(QueryCon
> >>troller.java:223)
> >>
> >>         at
> >>
> >>
> >>org.apache.kylin.rest.controller.QueryController.doQuery(QueryController.
> >>java:174)
> >>
> >>         at
> >>
> >>
> >>org.apache.kylin.rest.controller.QueryController.query(QueryController.ja
> >>va:91)
> >>
> >>         at
> >>
> >>
> >>org.apache.kylin.rest.controller.QueryController$$FastClassByCGLIB$$fc039
> >>d0b.invoke(<generated>)
> >>
> >>         at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
> >>
> >>         at
> >>
> >>
> >>org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.in
> >>vokeJoinpoint(Cglib2AopProxy.java:689)
> >>
> >>         at
> >>
> >>
> >>org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(Refl
> >>ectiveMethodInvocation.java:150)
> >>
> >>         at
> >>
> >>
> >>com.ryantenney.metrics.spring.TimedMethodInterceptor.invoke(TimedMethodIn
> >>terceptor.java:48)
> >>
> >>         at
> >>
> >>
> >>com.ryantenney.metrics.spring.TimedMethodInterceptor.invoke(TimedMethodIn
> >>terceptor.java:34)
> >>
> >>         at
> >>
> >>
> >>com.ryantenney.metrics.spring.AbstractMetricMethodInterceptor.invoke(Abst
> >>ractMetricMethodInterceptor.java:59)
> >>
> >>         at
> >>
> >>
> >>org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(Refl
> >>ectiveMethodInvocation.java:172)
> >>
> >>         at
> >>
> >>
> >>org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedIntercepto
> >>r.intercept(Cglib2AopProxy.java:622)
> >>
> >>         at
> >>
> >>
> >>org.apache.kylin.rest.controller.QueryController$$EnhancerByCGLIB$$5b6079
> >>24.query(<generated>)
> >>
> >>         at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown Source)
> >>
> >>         at
> >>
> >>
> >>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI
> >>mpl.java:43)
> >>
> >>         at java.lang.reflect.Method.invoke(Method.java:606)
> >>
> >>         at
> >>
> >>
> >>org.springframework.web.method.support.InvocableHandlerMethod.invoke(Invo
> >>cableHandlerMethod.java:213)
> >>
> >>         at
> >>
> >>
> >>org.springframework.web.method.support.InvocableHandlerMethod.invokeForRe
> >>quest(InvocableHandlerMethod.java:126)
> >>
> >>         at
> >>
> >>
> >>org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHan
> >>dlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96)
> >>
> >>         at
> >>
> >>
> >>org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandl
> >>erAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617)
> >>
> >>         at
> >>
> >>
> >>org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandl
> >>erAdapter.handleInternal(RequestMappingHandlerAdapter.java:578)
> >>
> >>         at
> >>
> >>
> >>org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.h
> >>andle(AbstractHandlerMethodAdapter.java:80)
> >>
> >>         at
> >>
> >>
> >>org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherSe
> >>rvlet.java:923)
> >>
> >>         at
> >>
> >>
> >>org.springframework.web.servlet.DispatcherServlet.doService(DispatcherSer
> >>vlet.java:852)
> >>
> >>         at
> >>
> >>
> >>org.springframework.web.servlet.FrameworkServlet.processRequest(Framework
> >>Servlet.java:882)
> >>
> >>         at
> >>
> >>
> >>org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.
> >>java:789)
> >>
> >>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
> >>
> >>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
> >>
> >>         at
> >>
> >>
> >>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
> >>ionFilterChain.java:303)
> >>
> >>         at
> >>
> >>
> >>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
> >>rChain.java:208)
> >>
> >>         at
> >> org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
> >>
> >>         at
> >>
> >>
> >>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
> >>ionFilterChain.java:241)
> >>
> >>         at
> >>
> >>
> >>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
> >>rChain.java:208)
> >>
> >>         at
> >>
> >>
> >>com.codahale.metrics.servlet.AbstractInstrumentedFilter.doFilter(Abstract
> >>InstrumentedFilter.java:97)
> >>
> >>         at
> >>
> >>
> >>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
> >>ionFilterChain.java:241)
> >>
> >>         at
> >>
> >>
> >>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
> >>rChain.java:208)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
> >>lter(FilterChainProxy.java:330)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.access.intercept.FilterSecurityIntercept
> >>or.invoke(FilterSecurityInterceptor.java:118)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.access.intercept.FilterSecurityIntercept
> >>or.doFilter(FilterSecurityInterceptor.java:84)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
> >>lter(FilterChainProxy.java:342)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.access.ExceptionTranslationFilter.doFilt
> >>er(ExceptionTranslationFilter.java:113)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
> >>lter(FilterChainProxy.java:342)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.session.SessionManagementFilter.doFilter
> >>(SessionManagementFilter.java:103)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
> >>lter(FilterChainProxy.java:342)
> >>
> >>        at
> >>
> >>
> >>org.springframework.security.web.authentication.AnonymousAuthenticationFi
> >>lter.doFilter(AnonymousAuthenticationFilter.java:113)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
> >>lter(FilterChainProxy.java:342)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.servletapi.SecurityContextHolderAwareReq
> >>uestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
> >>lter(FilterChainProxy.java:342)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doF
> >>ilter(RequestCacheAwareFilter.java:45)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
> >>lter(FilterChainProxy.java:342)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.authentication.www.BasicAuthenticationFi
> >>lter.doFilter(BasicAuthenticationFilter.java:150)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
> >>lter(FilterChainProxy.java:342)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.authentication.ui.DefaultLoginPageGenera
> >>tingFilter.doFilter(DefaultLoginPageGeneratingFilter.java:91)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
> >>lter(FilterChainProxy.java:342)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.authentication.AbstractAuthenticationPro
> >>cessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:183)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
> >>lter(FilterChainProxy.java:342)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.authentication.logout.LogoutFilter.doFil
> >>ter(LogoutFilter.java:105)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
> >>lter(FilterChainProxy.java:342)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.context.SecurityContextPersistenceFilter
> >>.doFilter(SecurityContextPersistenceFilter.java:87)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
> >>lter(FilterChainProxy.java:342)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.FilterChainProxy.doFilterInternal(Filter
> >>ChainProxy.java:192)
> >>
> >>         at
> >>
> >>
> >>org.springframework.security.web.FilterChainProxy.doFilter(FilterChainPro
> >>xy.java:160)
> >>
> >>         at
> >>
> >>
> >>org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(Deleg
> >>atingFilterProxy.java:346)
> >>
> >>         at
> >>
> >>
> >>org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingF
> >>ilterProxy.java:259)
> >>
> >>         at
> >>
> >>
> >>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
> >>ionFilterChain.java:241)
> >>
> >>         at
> >>
> >>
> >>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
> >>rChain.java:208)
> >>
> >>         at
> >>
> >>
> >>org.apache.kylin.rest.filter.KylinApiFilter.doFilterInternal(KylinApiFilt
> >>er.java:64)
> >>
> >>         at
> >>
> >>
> >>org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerReque
> >>stFilter.java:76)
> >>
> >>         at
> >>
> >>
> >>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
> >>ionFilterChain.java:241)
> >>
> >>         at
> >>
> >>
> >>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
> >>rChain.java:208)
> >>
> >>         at
> >> com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:195)
> >>
> >>         at
> >> com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:266)
> >>
> >>         at
> >>
> >>
> >>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
> >>ionFilterChain.java:241)
> >>
> >>         at
> >>
> >>
> >>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
> >>rChain.java:208)
> >>
> >>         at
> >>
> >>
> >>org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve
> >>.java:220)
> >>
> >>         at
> >>
> >>
> >>org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve
> >>.java:122)
> >>
> >>         at
> >>
> >>
> >>org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorB
> >>ase.java:504)
> >>
> >>         at
> >>
> >>
> >>org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:
> >>170)
> >>
> >>         at
> >>
> >>
> >>org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:
> >>103)
> >>
> >>         at
> >>
> >>org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
> >>
> >>         at
> >>
> >>
> >>org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.j
> >>ava:116)
> >>
> >>         at
> >>
> >>org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:42
> >>1)
> >>
> >>         at
> >>
> >>
> >>org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Pr
> >>ocessor.java:1074)
> >>
> >>         at
> >>
> >>
> >>org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(Abst
> >>ractProtocol.java:611)
> >>
> >>         at
> >>
> >>
> >>org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.ja
> >>va:316)
> >>
> >>         at
> >>
> >>
> >>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java
> >>:1145)
> >>
> >>         at
> >>
> >>
> >>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav
> >>a:615)
> >>
> >>         at
> >>
> >>
> >>org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread
> >>.java:61)
> >>
> >>         at java.lang.Thread.run(Thread.java:744)
> >>
> >>
> >> The result for "select * from sample_dim"
> >>
> >> ID,DIM1,DIM2
> >>
> >> 33814,NYC,USA
> >>
> >> 201431,PARIS,FRANCE
> >>
> >> etc.
> >>
> >>
> >>
> >> On Wed, Jul 29, 2015 at 3:37 PM, alex schufo <[email protected]>
> >>wrote:
> >>
> >> > So with 0.7.2 the cube builds, and I can see some improvement:
> >> >
> >> > "select * from SAMPLE_DIM" now returns all the fields, i.e:
> >> >
> >> > dim1, dim2, dim3, etc., SAMPLE_ID
> >> >
> >> > and I can see all the values for each field.
> >> >
> >> > However the join between the fact table and the lookup table still
> >>does
> >> > not work, it returns:
> >> >
> >> >     Can't find any realization.
> >> >
> >> > And if I do "select SAMPLE_ID from SAMPLE_DIM group by SAMPLE_ID" it
> >>also
> >> > returns:
> >> >
> >> >     Can't find any realization.
> >> >
> >> > If I do "select SAMPLE_ID from FACT_TABLE group by SAMPLE_ID" then I
> >>get
> >> > the list of all SAMPLE_ID as expected.
> >> >
> >> > If I do "select dim1 from SAMPLE_DIM group by dim1" I also get the
> >>list
> >> of
> >> > all dim1 as expected.
> >> >
> >> > The same exact query works perfectly on Hive (although it takes a long
> >> > time to be processed of course).
> >> >
> >> > Am I doing something wrong?
> >> >
> >> > On Wed, Jul 29, 2015 at 1:35 PM, alex schufo <[email protected]>
> >> wrote:
> >> >
> >> >> Ok I guess this is https://issues.apache.org/jira/browse/KYLIN-831,
> >> >> right?
> >> >>
> >> >> I upgraded today to 0.7.2 and hope it solves the problem then.
> >> >>
> >> >> Regards
> >> >>
> >> >> On Tue, Jul 28, 2015 at 5:52 PM, alex schufo <[email protected]>
> >> >> wrote:
> >> >>
> >> >>> I still don't understand this.
> >> >>>
> >> >>> I have a simple fact table and a simple SAMPLE_DIM lookup table.
> >>They
> >> >>> are joined on SAMPLE_ID.
> >> >>>
> >> >>> If I do like you say and include all the columns of SAMPLE_DIM as a
> >> >>> hierarchy and do not include the SAMPLE_ID then the cube builds
> >> >>> successfully but I cannot query with the hierarchy. Any join
> >>results in
> >> >>> this error:
> >> >>>
> >> >>> Column 'SAMPLE_ID' not found in table 'SAMPLE_DIM'
> >> >>>
> >> >>> Indeed if I do a select * from 'SAMPLE_DIM' I can see all the
> >>hierarchy
> >> >>> but not the SAMPLE_ID used to join with the fact table.
> >> >>>
> >> >>> If I include the SAMPLE_ID in the hierarchy definition then the cube
> >> >>> build fails on step 3 with:
> >> >>>
> >> >>> java.lang.NullPointerException: Column DEFAULT.FACT_TABLE.SAMPLE_ID
> >> does
> >> >>> not exist in row key desc
> >> >>> at
> >> org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158)
> >> >>> at
> >> >>>
> >>
> >>org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152)
> >> >>> at
> >> >>>
> >>
> >>org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:16
> >>3)
> >> >>> at
> >> >>>
> >>
> >>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionar
> >>yGeneratorCLI.java:51)
> >> >>> at
> >> >>>
> >>
> >>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionar
> >>yGeneratorCLI.java:42)
> >> >>> at
> >> >>>
> >>
> >>org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionary
> >>Job.java:53)
> >> >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >> >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> >> >>> at
> >> >>>
> >>
> >>org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecu
> >>table.java:63)
> >> >>> at
> >> >>>
> >>
> >>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecuta
> >>ble.java:107)
> >> >>> at
> >> >>>
> >>
> >>org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultCha
> >>inedExecutable.java:50)
> >> >>> at
> >> >>>
> >>
> >>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecuta
> >>ble.java:107)
> >> >>> at
> >> >>>
> >>
> >>org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Defau
> >>ltScheduler.java:132)
> >> >>> at
> >> >>>
> >>
> >>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java
> >>:1145)
> >> >>> at
> >> >>>
> >>
> >>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav
> >>a:615)
> >> >>> at java.lang.Thread.run(Thread.java:744)
> >> >>>
> >> >>> (the SAMPLE_ID *does* exist in the FACT_TABLE)
> >> >>>
> >> >>> The only scenario I could make it work is when I also create a
> >>derived
> >> >>> dimension SAMPLE_ID / something else, then somehow the SAMPLE_ID is
> >> >>> included and can be queried.
> >> >>>
> >> >>> Any help with that?
> >> >>>
> >> >>>
> >> >>> On Fri, Jun 19, 2015 at 1:37 PM, alex schufo <[email protected]>
> >> >>> wrote:
> >> >>>
> >> >>>> Thanks for the answer,
> >> >>>>
> >> >>>> Indeed I had a look at these slides before and it's great to
> >> understand
> >> >>>> the high level concepts but I ended up spending quite some time
> >>when
> >> >>>> designing my dimensions with the issues mentioned below.
> >> >>>>
> >> >>>> On Fri, Jun 19, 2015 at 11:23 AM, jason zhong
> >><[email protected]
> >> >
> >> >>>> wrote:
> >> >>>>
> >> >>>>> Hi Alex,
> >> >>>>>
> >> >>>>> We have a slide to hlep you understand how to build cube.I don't
> >>know
> >> >>>>> whether you have read this? This will hlep you understand derived
> >>and
> >> >>>>> hierarchy.
> >> >>>>>
> >> >>>>> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
> >> >>>>>
> >> >>>>> for your case about hierarchy,log_date should not be included in
> >> >>>>> hierarchy
> >> >>>>> ,here's a bug you help find it.we will follow this.
> >> >>>>>
> >> >>>>> also .more document and UI enhancement will be done to help user
> >> build
> >> >>>>> cube
> >> >>>>> easily.
> >> >>>>>
> >> >>>>> Thanks!!
> >> >>>>>
> >> >>>>> On Fri, Jun 12, 2015 at 5:07 PM, alex schufo
> >><[email protected]>
> >> >>>>> wrote:
> >> >>>>>
> >> >>>>> > I am trying to create a simple cube with a fact table and 3
> >> >>>>> dimensions.
> >> >>>>> >
> >> >>>>> > I have read the different slideshares and wiki pages, but I
> >>found
> >> >>>>> that the
> >> >>>>> > documentation is not very specific on how to manage hierarchies.
> >> >>>>> >
> >> >>>>> > Let's take this simple example :
> >> >>>>> >
> >> >>>>> > Fact table: productID, storeID, logDate, numbOfSell, etc.
> >> >>>>> >
> >> >>>>> > Date lookup table : logDate, week, month, quarter, etc.
> >> >>>>> >
> >> >>>>> > I specified Left join on logDate, actually when I specify this I
> >> >>>>> find it
> >> >>>>> > not very clear which one is considered to be the Left table and
> >> >>>>> which one
> >> >>>>> > is considered to be the Right table. I assumed the Fact table
> >>was
> >> >>>>> the left
> >> >>>>> > table and the Lookup table the right table, looking at it now I
> >> >>>>> think that
> >> >>>>> > might be a mistake (I am just interested in dates for which
> >>there
> >> are
> >> >>>>> > results in the fact table).
> >> >>>>> >
> >> >>>>> > If I use the auto generator it creates a derived dimension, I
> >>don't
> >> >>>>> think
> >> >>>>> > that's what I need.
> >> >>>>> >
> >> >>>>> > So I created a hierarchy, but again to me it's clearly indicated
> >> if I
> >> >>>>> > should create ["quarter", "month", "week", "log_date"] or
> >> ["logDate",
> >> >>>>> > "week", "month", "quarter"]?
> >> >>>>> >
> >> >>>>> > Also should I include log_date in the hierarchy? To me it was
> >>more
> >> >>>>> > intuitive not to include it because it's already the join, but
> >>it
> >> >>>>> created
> >> >>>>> > the cube without it and I cannot query by date, it says that
> >> >>>>> "log_date" is
> >> >>>>> > not found in the date table (it is in the Hive table but not the
> >> cube
> >> >>>>> > built). If I include it in the hierarchy the cube build fails
> >>with
> >> >>>>> this
> >> >>>>> > error :
> >> >>>>> >
> >> >>>>> > java.lang.NullPointerException: Column
> >>DEFAULT.DATE_TABLE.LOG_DATE
> >> >>>>> > does not exist in row key desc
> >> >>>>> >         at
> >> >>>>> >
> >> >>>>>
> >> org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158)
> >> >>>>> >         at
> >> >>>>> >
> >> >>>>>
> >>
> >>org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152)
> >> >>>>> >         at
> >> >>>>> >
> >> >>>>>
> >>
> >>org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:16
> >>3)
> >> >>>>> >         at
> >> >>>>> >
> >> >>>>>
> >>
> >>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionar
> >>yGeneratorCLI.java:51)
> >> >>>>> >         at
> >> >>>>> >
> >> >>>>>
> >>
> >>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionar
> >>yGeneratorCLI.java:42)
> >> >>>>> >         at
> >> >>>>> >
> >> >>>>>
> >>
> >>org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionary
> >>Job.java:53)
> >> >>>>> >         at
> >> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >> >>>>> >         at
> >> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> >> >>>>> >         at
> >> >>>>> >
> >> >>>>>
> >>
> >>org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecu
> >>table.java:63)
> >> >>>>> >         at
> >> >>>>> >
> >> >>>>>
> >>
> >>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecuta
> >>ble.java:107)
> >> >>>>> >         at
> >> >>>>> >
> >> >>>>>
> >>
> >>org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultCha
> >>inedExecutable.java:50)
> >> >>>>> >         at
> >> >>>>> >
> >> >>>>>
> >>
> >>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecuta
> >>ble.java:107)
> >> >>>>> >         at
> >> >>>>> >
> >> >>>>>
> >>
> >>org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Defau
> >>ltScheduler.java:132)
> >> >>>>> >         at
> >> >>>>> >
> >> >>>>>
> >>
> >>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java
> >>:1145)
> >> >>>>> >         at
> >> >>>>> >
> >> >>>>>
> >>
> >>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav
> >>a:615)
> >> >>>>> >         at java.lang.Thread.run(Thread.java:744)
> >> >>>>> >
> >> >>>>> > result code:2
> >> >>>>> >
> >> >>>>> >
> >> >>>>> > I think it might be useful to improve the documentation to
> >>explain
> >> >>>>> this
> >> >>>>> > more clearly and not just the basic steps because building a
> >>cube
> >> >>>>> even on
> >> >>>>> > short time ranges takes some time so learning by trial / error
> >>is
> >> >>>>> very time
> >> >>>>> > consuming.
> >> >>>>> >
> >> >>>>> > Same thing for the derived dimensions, should I include
> >>["storeID",
> >> >>>>> > "storeName"] or just ["storeName"]? The second option seems to
> >>work
> >> >>>>> for me.
> >> >>>>> >
> >> >>>>> > Thanks
> >> >>>>> >
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >>
> >
> >
> >
> >--
> >Regards,
> >
> >*Bin Mahone | 马洪宾*
> >Apache Kylin: http://kylin.io
> >Github: https://github.com/binmahone
>
>


-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Reply via email to