Re: Modeling hierarchies

alex schufo Wed, 05 Aug 2015 07:17:46 -0700

Hi,

Thank you for your answer.


No, nobody else was replying, I was just trying to give you guys as much
information as possible, hence the successive emails trying to be as
detailed as possible. But all this is the same problem and data set from
the start.

To be honest I don't really understand what is it you don't understand,
except the possible confusion that the thread is spread in successive
emails with additional details. I think I tried to explain my problem in
plain English, gave SQL query examples, output examples, exact copy of
Kylin error messages and full stack trace as well as the Jira link that I
assumed relative to my initial problem.

Let me try to summarize again:

My problem is not really in understanding what is a hierarchy and what is a
derived column, but anyway thank you for your detailed explanation. It
didn't really help me but I agree with others on this thread that this is a
good quality summary and would benefit everyone to be featured on Kylin
website / documentation.

My problem is that I used 0.7.1 to build a cube with a hierarchy on lookup
table and the process is successful. But at query time the join between the
FK of the fact table and the PK of the Lookup table was failing due to

"java.lang.NullPointerException: Column {LOOKUP.PK} does not exist in row
key desc"

I wondered if this has been solved in 0.7.2 so I upgraded (in particular I
noticed this Jira https://issues.apache.org/jira/browse/KYLIN-831 which
seemed relevant), tried again and saw improvement (PK seemed apparent this
time) but my join still failed with the following error:

"Can't find any realization."

Anyway, I managed to find the reason of this problem. I was trying to do a
left join. When I do a query with a join I get the expected result. I had
tried to play with left join / inner join / right join condition in the
cube definition before but couldn't make it work. The combination that made
it work for me was to use an inner join in the cube definition and also
when querying.


On Mon, Aug 3, 2015 at 4:32 AM, hongbin ma <[email protected]> wrote:

> hi alex,
>
> I'm not quite following this thread? It looks like except jason
> recommending your the slideshare link, no one is replying you. To whom are
> your communicating? (Is someone sending reply to you in person instead of
> to the dev list?)
>
> Can you re-summarize your problem again and I might be able to help you.
> I found you're confused by hierarchies and derived columns, here's some
> take-aways, I'm recently summarizing them to formal docs:
>
> *Hierarchies:*
>
> Theoretically for N dimensions you'll end up with 2^N dimension
> combinations. However for some group of dimensions there are no need to
> create so many combinations. For example, if you have three dimensions:
> continent, country, city (In hierarchies, the "bigger" dimension comes
> first). You will only need the following three combinations of group by
> when you do drill down analysis:
>
> group by continent
> group by continent, country
> group by continent, country, city
>
> In such cases the combination count is reduced from 2^3=8 to 3, which is a
> great optimization. The same goes for the YEAR,QUATER,MONTH,DATE case.
>
> If we Donate the hierarchy dimension as H1,H2,H3, typical scenarios would
> be:
>
> *A. Hierarchies on lookup table*
>
> Fact  Table                            (joins)         Lookup Table
> ===================                         =============
> column1,column2,,,,,, FK                         PK,,H1,H2,H3,,,,
>
> B. Hierarchies on fact table
>
> Fact  Table
> ===========================
> column1,column2,,,H1,H2,H3,,,,,,,
>
> There is a special case for scenario A, where PK on the lookup table is
> accidentally being part of the hierarchies. For example we have a calendar
> lookup table where cal_dt is the primary key:
>
> *A*. Hierarchies on lookup table over its primary key*
>
> Lookup Table(Calendar)
> ==============================================
> cal_dt(PK), week_beg_dt, month_beg_dt, quarter_beg_dt,,,
>
> For cases like A* what you need is another optimization called "Derived
> Columns"
>
> *Derived Columns:*
>
> Derived column is used when one or more dimensions (They must be dimension
> on lookup table, these columns are called "Derived") can be deduced from
> another(Usually it is the corresponding FK, this is called the "host
> column")
>
> For example, suppose we have a lookup table where we join fact table and it
> with "where DimA = DimX". Notice in Kylin, if you choose FK into a
> dimension, the corresponding PK will be automatically querable, without any
> extra cost. The secret is that since FK and PK are always identical, Kylin
> can apply filters/groupby on the FK first, and transparently replace them
> to PK.  This indicates that if we want the DimA(FK), DimX(PK), DimB, DimC
> in our cube, we can safely choose DimA,DimB,DimC only.
>
>
> Fact  Table                                     (joins)           Lookup
> Table
> ========================                         =============
> column1,column2,,,,,, DimA(FK)                         DimX(PK),,DimB, DimC
>
> Let's say that DimA(the dimension representing FK/PK) has a special mapping
> to DimB:
>
> dimA    dimB  dimC
> 1           a        ?
> 2           b        ?
> 3           c        ?
> 4           a        ?
>
> in this case, given a value in DimA, the value of DimB is determined, so we
> say dimB can be derived from DimA. When we build a cube that contains both
> DimA and DimB, we simple include DimA, and marking DimB as derived. Derived
> column(DimB) does not participant in cuboids generation:
>
> original combinations:
> ABC,AB,AC,BC,A,B,C
>
> combinations when driving B from A:
> AC,A,C
>
> at Runtime, in case queries like "select count(*) from fact_table inner
> join looup1 group by looup1 .dimB", it is expecting cuboid containing DimB
> to answer the query. However, DimB will appear in NONE of the cuboids due
> to derived optimization. In this case, we modify the execution plan to make
> it group by  DimA(its host column) first, we'll get intermediate answer
> like:
>
> DimA  count(*)
> 1          1
> 2          1
> 3          1
> 4          1
>
> Afterwards, Kylin will replace DimA values with DimB values(since both of
> their values are in lookup table, Kylin can load the whole lookup table
> into memory and build a mapping for them), and the intermediate result
> becomes:
>
> DimB  count(*)
> a          1
> b          1
> c          1
> a          1
>
> After this, the runtime SQL engine(calcite) will further aggregate the
> intermediate result to:
>
> DimB  count(*)
> a          2
> b          1
> c          1
>
> this step happens at query runtime, this is what it means "at the cost of
> extra runtime aggregation"
>
>
>
>
> On Sat, Aug 1, 2015 at 12:54 AM, alex schufo <[email protected]> wrote:
>
> > Sorry to be a bit annoying with the topic but I tried different cubes /
> > hierarchies and can never join.
> >
> > Without this basically I cannot use Kylin on PROD for my project.
> >
> > The stack trace:
> >
> > http-bio-7070-exec-3]:[2015-07-31
> >
> >
> 09:42:06,337][ERROR][org.apache.kylin.rest.controller.BasicController.handleError(BasicController.java:52)]
> > -
> >
> > org.apache.kylin.rest.exception.InternalErrorException: Can't find any
> > realization. Please confirm with providers. SQL digest: fact table
> > DEFAULT.SAMPLE_DIM,group by [DEFAULT.SAMPLE_DIM.ID],filter on [],with
> > aggregates[].
> >
> > while executing SQL: "select id from sample_dim group by id LIMIT 50000"
> >
> >         at
> >
> >
> org.apache.kylin.rest.controller.QueryController.doQueryInternal(QueryController.java:223)
> >
> >         at
> >
> >
> org.apache.kylin.rest.controller.QueryController.doQuery(QueryController.java:174)
> >
> >         at
> >
> >
> org.apache.kylin.rest.controller.QueryController.query(QueryController.java:91)
> >
> >         at
> >
> >
> org.apache.kylin.rest.controller.QueryController$$FastClassByCGLIB$$fc039d0b.invoke(<generated>)
> >
> >         at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
> >
> >         at
> >
> >
> org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:689)
> >
> >         at
> >
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
> >
> >         at
> >
> >
> com.ryantenney.metrics.spring.TimedMethodInterceptor.invoke(TimedMethodInterceptor.java:48)
> >
> >         at
> >
> >
> com.ryantenney.metrics.spring.TimedMethodInterceptor.invoke(TimedMethodInterceptor.java:34)
> >
> >         at
> >
> >
> com.ryantenney.metrics.spring.AbstractMetricMethodInterceptor.invoke(AbstractMetricMethodInterceptor.java:59)
> >
> >         at
> >
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
> >
> >         at
> >
> >
> org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:622)
> >
> >         at
> >
> >
> org.apache.kylin.rest.controller.QueryController$$EnhancerByCGLIB$$5b607924.query(<generated>)
> >
> >         at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown Source)
> >
> >         at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >
> >         at java.lang.reflect.Method.invoke(Method.java:606)
> >
> >         at
> >
> >
> org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:213)
> >
> >         at
> >
> >
> org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126)
> >
> >         at
> >
> >
> org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96)
> >
> >         at
> >
> >
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617)
> >
> >         at
> >
> >
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578)
> >
> >         at
> >
> >
> org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80)
> >
> >         at
> >
> >
> org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
> >
> >         at
> >
> >
> org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
> >
> >         at
> >
> >
> org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
> >
> >         at
> >
> >
> org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
> >
> >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
> >
> >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> >
> >         at
> > org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> >
> >         at
> >
> >
> com.codahale.metrics.servlet.AbstractInstrumentedFilter.doFilter(AbstractInstrumentedFilter.java:97)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
> >
> >         at
> >
> >
> org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118)
> >
> >         at
> >
> >
> org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:103)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >        at
> >
> >
> org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:45)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.authentication.www.BasicAuthenticationFilter.doFilter(BasicAuthenticationFilter.java:150)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.authentication.ui.DefaultLoginPageGeneratingFilter.doFilter(DefaultLoginPageGeneratingFilter.java:91)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:183)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:105)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:87)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:192)
> >
> >         at
> >
> >
> org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:160)
> >
> >         at
> >
> >
> org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:346)
> >
> >         at
> >
> >
> org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:259)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> >
> >         at
> >
> >
> org.apache.kylin.rest.filter.KylinApiFilter.doFilterInternal(KylinApiFilter.java:64)
> >
> >         at
> >
> >
> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> >
> >         at
> > com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:195)
> >
> >         at
> > com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:266)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> >
> >         at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> >
> >         at
> >
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
> >
> >         at
> >
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
> >
> >         at
> >
> >
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:504)
> >
> >         at
> >
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
> >
> >         at
> >
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
> >
> >         at
> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
> >
> >         at
> >
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
> >
> >         at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:421)
> >
> >         at
> >
> >
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1074)
> >
> >         at
> >
> >
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
> >
> >         at
> >
> >
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >         at
> >
> >
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
> >
> >         at java.lang.Thread.run(Thread.java:744)
> >
> >
> > The result for "select * from sample_dim"
> >
> > ID,DIM1,DIM2
> >
> > 33814,NYC,USA
> >
> > 201431,PARIS,FRANCE
> >
> > etc.
> >
> >
> >
> > On Wed, Jul 29, 2015 at 3:37 PM, alex schufo <[email protected]>
> wrote:
> >
> > > So with 0.7.2 the cube builds, and I can see some improvement:
> > >
> > > "select * from SAMPLE_DIM" now returns all the fields, i.e:
> > >
> > > dim1, dim2, dim3, etc., SAMPLE_ID
> > >
> > > and I can see all the values for each field.
> > >
> > > However the join between the fact table and the lookup table still does
> > > not work, it returns:
> > >
> > >     Can't find any realization.
> > >
> > > And if I do "select SAMPLE_ID from SAMPLE_DIM group by SAMPLE_ID" it
> also
> > > returns:
> > >
> > >     Can't find any realization.
> > >
> > > If I do "select SAMPLE_ID from FACT_TABLE group by SAMPLE_ID" then I
> get
> > > the list of all SAMPLE_ID as expected.
> > >
> > > If I do "select dim1 from SAMPLE_DIM group by dim1" I also get the list
> > of
> > > all dim1 as expected.
> > >
> > > The same exact query works perfectly on Hive (although it takes a long
> > > time to be processed of course).
> > >
> > > Am I doing something wrong?
> > >
> > > On Wed, Jul 29, 2015 at 1:35 PM, alex schufo <[email protected]>
> > wrote:
> > >
> > >> Ok I guess this is https://issues.apache.org/jira/browse/KYLIN-831,
> > >> right?
> > >>
> > >> I upgraded today to 0.7.2 and hope it solves the problem then.
> > >>
> > >> Regards
> > >>
> > >> On Tue, Jul 28, 2015 at 5:52 PM, alex schufo <[email protected]>
> > >> wrote:
> > >>
> > >>> I still don't understand this.
> > >>>
> > >>> I have a simple fact table and a simple SAMPLE_DIM lookup table. They
> > >>> are joined on SAMPLE_ID.
> > >>>
> > >>> If I do like you say and include all the columns of SAMPLE_DIM as a
> > >>> hierarchy and do not include the SAMPLE_ID then the cube builds
> > >>> successfully but I cannot query with the hierarchy. Any join results
> in
> > >>> this error:
> > >>>
> > >>> Column 'SAMPLE_ID' not found in table 'SAMPLE_DIM'
> > >>>
> > >>> Indeed if I do a select * from 'SAMPLE_DIM' I can see all the
> hierarchy
> > >>> but not the SAMPLE_ID used to join with the fact table.
> > >>>
> > >>> If I include the SAMPLE_ID in the hierarchy definition then the cube
> > >>> build fails on step 3 with:
> > >>>
> > >>> java.lang.NullPointerException: Column DEFAULT.FACT_TABLE.SAMPLE_ID
> > does
> > >>> not exist in row key desc
> > >>> at
> > org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158)
> > >>> at
> > >>>
> > org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152)
> > >>> at
> > >>>
> >
> org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:163)
> > >>> at
> > >>>
> >
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:51)
> > >>> at
> > >>>
> >
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
> > >>> at
> > >>>
> >
> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
> > >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > >>> at
> > >>>
> >
> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
> > >>> at
> > >>>
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> > >>> at
> > >>>
> >
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> > >>> at
> > >>>
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> > >>> at
> > >>>
> >
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
> > >>> at
> > >>>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >>> at
> > >>>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>> at java.lang.Thread.run(Thread.java:744)
> > >>>
> > >>> (the SAMPLE_ID *does* exist in the FACT_TABLE)
> > >>>
> > >>> The only scenario I could make it work is when I also create a
> derived
> > >>> dimension SAMPLE_ID / something else, then somehow the SAMPLE_ID is
> > >>> included and can be queried.
> > >>>
> > >>> Any help with that?
> > >>>
> > >>>
> > >>> On Fri, Jun 19, 2015 at 1:37 PM, alex schufo <[email protected]>
> > >>> wrote:
> > >>>
> > >>>> Thanks for the answer,
> > >>>>
> > >>>> Indeed I had a look at these slides before and it's great to
> > understand
> > >>>> the high level concepts but I ended up spending quite some time when
> > >>>> designing my dimensions with the issues mentioned below.
> > >>>>
> > >>>> On Fri, Jun 19, 2015 at 11:23 AM, jason zhong <
> [email protected]
> > >
> > >>>> wrote:
> > >>>>
> > >>>>> Hi Alex,
> > >>>>>
> > >>>>> We have a slide to hlep you understand how to build cube.I don't
> know
> > >>>>> whether you have read this? This will hlep you understand derived
> and
> > >>>>> hierarchy.
> > >>>>>
> > >>>>> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
> > >>>>>
> > >>>>> for your case about hierarchy,log_date should not be included in
> > >>>>> hierarchy
> > >>>>> ,here's a bug you help find it.we will follow this.
> > >>>>>
> > >>>>> also .more document and UI enhancement will be done to help user
> > build
> > >>>>> cube
> > >>>>> easily.
> > >>>>>
> > >>>>> Thanks!!
> > >>>>>
> > >>>>> On Fri, Jun 12, 2015 at 5:07 PM, alex schufo <[email protected]
> >
> > >>>>> wrote:
> > >>>>>
> > >>>>> > I am trying to create a simple cube with a fact table and 3
> > >>>>> dimensions.
> > >>>>> >
> > >>>>> > I have read the different slideshares and wiki pages, but I found
> > >>>>> that the
> > >>>>> > documentation is not very specific on how to manage hierarchies.
> > >>>>> >
> > >>>>> > Let's take this simple example :
> > >>>>> >
> > >>>>> > Fact table: productID, storeID, logDate, numbOfSell, etc.
> > >>>>> >
> > >>>>> > Date lookup table : logDate, week, month, quarter, etc.
> > >>>>> >
> > >>>>> > I specified Left join on logDate, actually when I specify this I
> > >>>>> find it
> > >>>>> > not very clear which one is considered to be the Left table and
> > >>>>> which one
> > >>>>> > is considered to be the Right table. I assumed the Fact table was
> > >>>>> the left
> > >>>>> > table and the Lookup table the right table, looking at it now I
> > >>>>> think that
> > >>>>> > might be a mistake (I am just interested in dates for which there
> > are
> > >>>>> > results in the fact table).
> > >>>>> >
> > >>>>> > If I use the auto generator it creates a derived dimension, I
> don't
> > >>>>> think
> > >>>>> > that's what I need.
> > >>>>> >
> > >>>>> > So I created a hierarchy, but again to me it's clearly indicated
> > if I
> > >>>>> > should create ["quarter", "month", "week", "log_date"] or
> > ["logDate",
> > >>>>> > "week", "month", "quarter"]?
> > >>>>> >
> > >>>>> > Also should I include log_date in the hierarchy? To me it was
> more
> > >>>>> > intuitive not to include it because it's already the join, but it
> > >>>>> created
> > >>>>> > the cube without it and I cannot query by date, it says that
> > >>>>> "log_date" is
> > >>>>> > not found in the date table (it is in the Hive table but not the
> > cube
> > >>>>> > built). If I include it in the hierarchy the cube build fails
> with
> > >>>>> this
> > >>>>> > error :
> > >>>>> >
> > >>>>> > java.lang.NullPointerException: Column
> DEFAULT.DATE_TABLE.LOG_DATE
> > >>>>> > does not exist in row key desc
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> > org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> > org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:163)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:51)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
> > >>>>> >         at
> > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > >>>>> >         at
> > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >>>>> >         at
> > >>>>> >
> > >>>>>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>>>> >         at java.lang.Thread.run(Thread.java:744)
> > >>>>> >
> > >>>>> > result code:2
> > >>>>> >
> > >>>>> >
> > >>>>> > I think it might be useful to improve the documentation to
> explain
> > >>>>> this
> > >>>>> > more clearly and not just the basic steps because building a cube
> > >>>>> even on
> > >>>>> > short time ranges takes some time so learning by trial / error is
> > >>>>> very time
> > >>>>> > consuming.
> > >>>>> >
> > >>>>> > Same thing for the derived dimensions, should I include
> ["storeID",
> > >>>>> > "storeName"] or just ["storeName"]? The second option seems to
> work
> > >>>>> for me.
> > >>>>> >
> > >>>>> > Thanks
> > >>>>> >
> > >>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: Modeling hierarchies

Reply via email to