Hi Maria

we do not have a formal doc for this now.
I was planning to refine the materials here in this thread, and publish it
as a formal document if the materials here answers your concerns.

What are you expecting from the "formal doc"?

On Mon, Aug 3, 2015 at 2:37 PM, Gaspare Maria <
[email protected]> wrote:

> Hi,
>
> Good explanation.
>
> Could you share the link to the formal doc you *Hierarchies:* ?
>
> A complete example shown this details will also help.
>
> Thanks.
>
> Gaspare Maria
>
> <div>-------- Messaggio originale --------</div><div>Da: hongbin ma <
> [email protected]> </div><div>Data:03/08/2015  05:32  (GMT+01:00)
> </div><div>A: dev <[email protected]> </div><div>Oggetto:
> Re: Modeling hierarchies </div><div>
> </div>hi alex,
>
> I'm not quite following this thread? It looks like except jason
> recommending your the slideshare link, no one is replying you. To whom are
> your communicating? (Is someone sending reply to you in person instead of
> to the dev list?)
>
> Can you re-summarize your problem again and I might be able to help you.
> I found you're confused by hierarchies and derived columns, here's some
> take-aways, I'm recently summarizing them to formal docs:
>
> *Hierarchies:*
>
> Theoretically for N dimensions you'll end up with 2^N dimension
> combinations. However for some group of dimensions there are no need to
> create so many combinations. For example, if you have three dimensions:
> continent, country, city (In hierarchies, the "bigger" dimension comes
> first). You will only need the following three combinations of group by
> when you do drill down analysis:
>
> group by continent
> group by continent, country
> group by continent, country, city
>
> In such cases the combination count is reduced from 2^3=8 to 3, which is a
> great optimization. The same goes for the YEAR,QUATER,MONTH,DATE case.
>
> If we Donate the hierarchy dimension as H1,H2,H3, typical scenarios would
> be:
>
> *A. Hierarchies on lookup table*
>
> Fact  Table                            (joins)         Lookup Table
> ===================                         =============
> column1,column2,,,,,, FK                         PK,,H1,H2,H3,,,,
>
> B. Hierarchies on fact table
>
> Fact  Table
> ===========================
> column1,column2,,,H1,H2,H3,,,,,,,
>
> There is a special case for scenario A, where PK on the lookup table is
> accidentally being part of the hierarchies. For example we have a calendar
> lookup table where cal_dt is the primary key:
>
> *A*. Hierarchies on lookup table over its primary key*
>
> Lookup Table(Calendar)
> ==============================================
> cal_dt(PK), week_beg_dt, month_beg_dt, quarter_beg_dt,,,
>
> For cases like A* what you need is another optimization called "Derived
> Columns"
>
> *Derived Columns:*
>
> Derived column is used when one or more dimensions (They must be dimension
> on lookup table, these columns are called "Derived") can be deduced from
> another(Usually it is the corresponding FK, this is called the "host
> column")
>
> For example, suppose we have a lookup table where we join fact table and it
> with "where DimA = DimX". Notice in Kylin, if you choose FK into a
> dimension, the corresponding PK will be automatically querable, without any
> extra cost. The secret is that since FK and PK are always identical, Kylin
> can apply filters/groupby on the FK first, and transparently replace them
> to PK.  This indicates that if we want the DimA(FK), DimX(PK), DimB, DimC
> in our cube, we can safely choose DimA,DimB,DimC only.
>
>
> Fact  Table                                     (joins)           Lookup
> Table
> ========================                         =============
> column1,column2,,,,,, DimA(FK)                         DimX(PK),,DimB, DimC
>
> Let's say that DimA(the dimension representing FK/PK) has a special mapping
> to DimB:
>
> dimA    dimB  dimC
> 1           a        ?
> 2           b        ?
> 3           c        ?
> 4           a        ?
>
> in this case, given a value in DimA, the value of DimB is determined, so we
> say dimB can be derived from DimA. When we build a cube that contains both
> DimA and DimB, we simple include DimA, and marking DimB as derived. Derived
> column(DimB) does not participant in cuboids generation:
>
> original combinations:
> ABC,AB,AC,BC,A,B,C
>
> combinations when driving B from A:
> AC,A,C
>
> at Runtime, in case queries like "select count(*) from fact_table inner
> join looup1 group by looup1 .dimB", it is expecting cuboid containing DimB
> to answer the query. However, DimB will appear in NONE of the cuboids due
> to derived optimization. In this case, we modify the execution plan to make
> it group by  DimA(its host column) first, we'll get intermediate answer
> like:
>
> DimA  count(*)
> 1          1
> 2          1
> 3          1
> 4          1
>
> Afterwards, Kylin will replace DimA values with DimB values(since both of
> their values are in lookup table, Kylin can load the whole lookup table
> into memory and build a mapping for them), and the intermediate result
> becomes:
>
> DimB  count(*)
> a          1
> b          1
> c          1
> a          1
>
> After this, the runtime SQL engine(calcite) will further aggregate the
> intermediate result to:
>
> DimB  count(*)
> a          2
> b          1
> c          1
>
> this step happens at query runtime, this is what it means "at the cost of
> extra runtime aggregation"
>
>
>
>
> On Sat, Aug 1, 2015 at 12:54 AM, alex schufo <[email protected]> wrote:
>
> > Sorry to be a bit annoying with the topic but I tried different cubes /
> > hierarchies and can never join.
> >
> > Without this basically I cannot use Kylin on PROD for my project.
> >
> > The stack trace:
> >
> > http-bio-7070-exec-3]:[2015-07-31
> >
> >
> 09:42:06,337][ERROR][org.apache.kylin.rest.controller.BasicController.handleError(BasicController.java:52)]
> > -
> >
> > org.apache.kylin.rest.exception.InternalErrorException: Can't find any
> > realization. Please confirm with providers. SQL digest: fact table
> > DEFAULT.SAMPLE_DIM,group by [DEFAULT.SAMPLE_DIM.ID],filter on [],with
> > aggregates[].
> >
> > while executing SQL: "select id from sample_dim group by id LIMIT 50000"
> >
> >         at
> >
> >
> org.apache.kylin.rest.controller.QueryController.doQueryInternal(QueryController.java:223)
> >
> >         at
> >
> >
> org.apache.kylin.rest.controller.QueryController.doQuery(QueryController.java:174)
> >
> >         at
> >
> >
> org.apache.kylin.rest.controller.QueryController.query(QueryController.java:91)
> >
> >         at
> >
> >
> org.apache.kylin.rest.controller.QueryController$$FastClassByCGLIB$$fc039d0b.invoke(<generated>)
> >
> >         at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
> >
> >         at
> >
> >
> org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:689)
> >
> >         at
> >
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
> >
> >         at
> >
> >
> com.ryantenney.metrics.spring.TimedMethodInterceptor.invoke(TimedMethodInterceptor.java:48)
> >
> >         at
> >
> >
> com.ryantenney.metrics.spring.TimedMethodInterceptor.invoke(TimedMethodInterceptor.java:34)
> >
> >         at
> >
> >
> com.ryantenney.metrics.spring.AbstractMetricMethodInterceptor.invoke(AbstractMetricMethodInterceptor.java:59)
> >
> >         at
> >
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
> >
> >         at
> >
> >
> org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:622)
> >
> >         at
> >
> >
> org.apache.kylin.rest.controller.QueryController$$EnhancerByCGLIB$$5b607924.query(<generated>)
> >
> >         at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown Source)
> >
> >         at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >
> >         at java.lang.reflect.Method.invoke(Method.java:606)
> >
> >         at
> >
> >
> org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:213)
> >
> >         at
> >
> >
> org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126)
> >
> >         at
> >
> >
> org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96)
> >
> >         at
> >
> >
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617)
> >
> >         at
> >
> >
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578)
> >
> >         at
> >
> >
> org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80)
> >
> >         at
> >
> >
> org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
> >
> >         at
> >
> >
> org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
> >
> >         at
> >
> >
> org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
> >
> >         at
> >
> >
> org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
> >
> >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
> >
> >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> >
> >         at
> > org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> >
> >         at
> >
> >
> com.codahale.metrics.servlet.AbstractInstrumentedFilter.doFilter(AbstractInstrumentedFilter.java:97)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
> >
> >         at
> >
> >
> org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118)
> >
> >         at
> >
> >
> org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:103)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >        at
> >
> >
> org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:45)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.authentication.www.BasicAuthenticationFilter.doFilter(BasicAuthenticationFilter.java:150)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.authentication.ui.DefaultLoginPageGeneratingFilter.doFilter(DefaultLoginPageGeneratingFilter.java:91)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:183)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:105)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:87)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:192)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:160)
> >
> >         at
> >
> >
> org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:346)
> >
> >         at
> >
> >
> org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:259)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> >
> >         at
> >
> >
> org.apache.kylin.rest.filter.KylinApiFilter.doFilterInternal(KylinApiFilter.java:64)
> >
> >         at
> >
> >
> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> >
> >         at
> > com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:195)
> >
> >         at
> > com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:266)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> >
> >         at
> >
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
> >
> >         at
> >
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
> >
> >         at
> >
> >
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:504)
> >
> >         at
> >
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
> >
> >         at
> >
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
> >
> >         at
> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
> >
> >         at
> >
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
> >
> >         at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:421)
> >
> >         at
> >
> >
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1074)
> >
> >         at
> >
> >
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
> >
> >         at
> >
> >
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >         at
> >
> >
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
> >
> >         at java.lang.Thread.run(Thread.java:744)
> >
> >
> > The result for "select * from sample_dim"
> >
> > ID,DIM1,DIM2
> >
> > 33814,NYC,USA
> >
> > 201431,PARIS,FRANCE
> >
> > etc.
> >
> >
> >
> > On Wed, Jul 29, 2015 at 3:37 PM, alex schufo <[email protected]>
> wrote:
> >
> > > So with 0.7.2 the cube builds, and I can see some improvement:
> > >
> > > "select * from SAMPLE_DIM" now returns all the fields, i.e:
> > >
> > > dim1, dim2, dim3, etc., SAMPLE_ID
> > >
> > > and I can see all the values for each field.
> > >
> > > However the join between the fact table and the lookup table still does
> > > not work, it returns:
> > >
> > >     Can't find any realization.
> > >
> > > And if I do "select SAMPLE_ID from SAMPLE_DIM group by SAMPLE_ID" it
> also
> > > returns:
> > >
> > >     Can't find any realization.
> > >
> > > If I do "select SAMPLE_ID from FACT_TABLE group by SAMPLE_ID" then I
> get
> > > the list of all SAMPLE_ID as expected.
> > >
> > > If I do "select dim1 from SAMPLE_DIM group by dim1" I also get the list
> > of
> > > all dim1 as expected.
> > >
> > > The same exact query works perfectly on Hive (although it takes a long
> > > time to be processed of course).
> > >
> > > Am I doing something wrong?
> > >
> > > On Wed, Jul 29, 2015 at 1:35 PM, alex schufo <[email protected]>
> > wrote:
> > >
> > >> Ok I guess this is https://issues.apache.org/jira/browse/KYLIN-831,
> > >> right?
> > >>
> > >> I upgraded today to 0.7.2 and hope it solves the problem then.
> > >>
> > >> Regards
> > >>
> > >> On Tue, Jul 28, 2015 at 5:52 PM, alex schufo <[email protected]>
> > >> wrote:
> > >>
> > >>> I still don't understand this.
> > >>>
> > >>> I have a simple fact table and a simple SAMPLE_DIM lookup table. They
> > >>> are joined on SAMPLE_ID.
> > >>>
> > >>> If I do like you say and include all the columns of SAMPLE_DIM as a
> > >>> hierarchy and do not include the SAMPLE_ID then the cube builds
> > >>> successfully but I cannot query with the hierarchy. Any join results
> in
> > >>> this error:
> > >>>
> > >>> Column 'SAMPLE_ID' not found in table 'SAMPLE_DIM'
> > >>>
> > >>> Indeed if I do a select * from 'SAMPLE_DIM' I can see all the
> hierarchy
> > >>> but not the SAMPLE_ID used to join with the fact table.
> > >>>
> > >>> If I include the SAMPLE_ID in the hierarchy definition then the cube
> > >>> build fails on step 3 with:
> > >>>
> > >>> java.lang.NullPointerException: Column DEFAULT.FACT_TABLE.SAMPLE_ID
> > does
> > >>> not exist in row key desc
> > >>> at
> > org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158)
> > >>> at
> > >>>
> > org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152)
> > >>> at
> > >>>
> >
> org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:163)
> > >>> at
> > >>>
> >
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:51)
> > >>> at
> > >>>
> >
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
> > >>> at
> > >>>
> >
> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
> > >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > >>> at
> > >>>
> >
> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
> > >>> at
> > >>>
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> > >>> at
> > >>>
> >
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> > >>> at
> > >>>
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> > >>> at
> > >>>
> >
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
> > >>> at
> > >>>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >>> at
> > >>>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>> at java.lang.Thread.run(Thread.java:744)
> > >>>
> > >>> (the SAMPLE_ID *does* exist in the FACT_TABLE)
> > >>>
> > >>> The only scenario I could make it work is when I also create a
> derived
> > >>> dimension SAMPLE_ID / something else, then somehow the SAMPLE_ID is
> > >>> included and can be queried.
> > >>>
> > >>> Any help with that?
> > >>>
> > >>>
> > >>> On Fri, Jun 19, 2015 at 1:37 PM, alex schufo <[email protected]>
> > >>> wrote:
> > >>>
> > >>>> Thanks for the answer,
> > >>>>
> > >>>> Indeed I had a look at these slides before and it's great to
> > understand
> > >>>> the high level concepts but I ended up spending quite some time when
> > >>>> designing my dimensions with the issues mentioned below.
> > >>>>
> > >>>> On Fri, Jun 19, 2015 at 11:23 AM, jason zhong <
> [email protected]
> > >
> > >>>> wrote:
> > >>>>
> > >>>>> Hi Alex,
> > >>>>>
> > >>>>> We have a slide to hlep you understand how to build cube.I don't
> know
> > >>>>> whether you have read this? This will hlep you understand derived
> and
> > >>>>> hierarchy.
> > >>>>>
> > >>>>> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
> > >>>>>
> > >>>>> for your case about hierarchy,log_date should not be included in
> > >>>>> hierarchy
> > >>>>> ,here's a bug you help find it.we will follow this.
> > >>>>>
> > >>>>> also .more document and UI enhancement will be done to help user
> > build
> > >>>>> cube
> > >>>>> easily.
> > >>>>>
> > >>>>> Thanks!!
> > >>>>>
> > >>>>> On Fri, Jun 12, 2015 at 5:07 PM, alex schufo <[email protected]
> >
> > >>>>> wrote:
> > >>>>>
> > >>>>> > I am trying to create a simple cube with a fact table and 3
> > >>>>> dimensions.
> > >>>>> >
> > >>>>> > I have read the different slideshares and wiki pages, but I found
> > >>>>> that the
> > >>>>> > documentation is not very specific on how to manage hierarchies.
> > >>>>> >
> > >>>>> > Let's take this simple example :
> > >>>>> >
> > >>>>> > Fact table: productID, storeID, logDate, numbOfSell, etc.
> > >>>>> >
> > >>>>> > Date lookup table : logDate, week, month, quarter, etc.
> > >>>>> >
> > >>>>> > I specified Left join on logDate, actually when I specify this I
> > >>>>> find it
> > >>>>> > not very clear which one is considered to be the Left table and
> > >>>>> which one
> > >>>>> > is considered to be the Right table. I assumed the Fact table was
> > >>>>> the left
> > >>>>> > table and the Lookup table the right table, looking at it now I
> > >>>>> think that
> > >>>>> > might be a mistake (I am just interested in dates for which there
> > are
> > >>>>> > results in the fact table).
> > >>>>> >
> > >>>>> > If I use the auto generator it creates a derived dimension, I
> don't
> > >>>>> think
> > >>>>> > that's what I need.
> > >>>>> >
> > >>>>> > So I created a hierarchy, but again to me it's clearly indicated
> > if I
> > >>>>> > should create ["quarter", "month", "week", "log_date"] or
> > ["logDate",
> > >>>>> > "week", "month", "quarter"]?
> > >>>>> >
> > >>>>> > Also should I include log_date in the hierarchy? To me it was
> more
> > >>>>> > intuitive not to include it because it's already the join, but it
> > >>>>> created
> > >>>>> > the cube without it and I cannot query by date, it says that
> > >>>>> "log_date" is
> > >>>>> > not found in the date table (it is in the Hive table but not the
> > cube
> > >>>>> > built). If I include it in the hierarchy the cube build fails
> with
> > >>>>> this
> > >>>>> > error :
> > >>>>> >
> > >>>>> > java.lang.NullPointerException: Column
> DEFAULT.DATE_TABLE.LOG_DATE
> > >>>>> > does not exist in row key desc
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> > org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> > org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:163)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:51)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
> > >>>>> >         at
> > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > >>>>> >         at
> > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>>>> >         at java.lang.Thread.run(Thread.java:744)
> > >>>>> >
> > >>>>> > result code:2
> > >>>>> >
> > >>>>> >
> > >>>>> > I think it might be useful to improve the documentation to
> explain
> > >>>>> this
> > >>>>> > more clearly and not just the basic steps because building a cube
> > >>>>> even on
> > >>>>> > short time ranges takes some time so learning by trial / error is
> > >>>>> very time
> > >>>>> > consuming.
> > >>>>> >
> > >>>>> > Same thing for the derived dimensions, should I include
> ["storeID",
> > >>>>> > "storeName"] or just ["storeName"]? The second option seems to
> work
> > >>>>> for me.
> > >>>>> >
> > >>>>> > Thanks
> > >>>>> >
> > >>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Reply via email to