Hi Alex,
First, the discussion about concept is about how to improve
documentation
and relative stuff, since people are talking about those here, everyone
will be
benefited from such discussion.
Then, when you are try to query on LOOKUP table, it will not works when
you
apply group by, since there's no cube for it. query on lookup table is only
support
simple query just like which works for you.
KYLIN-831 is the bug when have FK in query but not included in cube
definition,
Please go with 0.7.2.
And, when talk about join, it should use same join condition when you
build the cube,
for example, inner join only works when you have one cube using inner join
to build it,
left join only works when you have cube using left join.
As reading your message, I'm wondering you are trying to pull data from
lookup table
with group by and then encounter such issue, right?
Hope these explain could bring answers for you.
And please feel free to continue discuss if there's still issue, or not
answered question
you have, we would like to help on.
Thanks.
Luke
Best Regards!
---------------------
Luke Han
On Wed, Aug 5, 2015 at 10:16 PM, alex schufo <[email protected]> wrote:
> Hi,
>
> Thank you for your answer.
>
> No, nobody else was replying, I was just trying to give you guys as much
> information as possible, hence the successive emails trying to be as
> detailed as possible. But all this is the same problem and data set from
> the start.
>
> To be honest I don't really understand what is it you don't understand,
> except the possible confusion that the thread is spread in successive
> emails with additional details. I think I tried to explain my problem in
> plain English, gave SQL query examples, output examples, exact copy of
> Kylin error messages and full stack trace as well as the Jira link that I
> assumed relative to my initial problem.
>
> Let me try to summarize again:
>
> My problem is not really in understanding what is a hierarchy and what is a
> derived column, but anyway thank you for your detailed explanation. It
> didn't really help me but I agree with others on this thread that this is a
> good quality summary and would benefit everyone to be featured on Kylin
> website / documentation.
>
> My problem is that I used 0.7.1 to build a cube with a hierarchy on lookup
> table and the process is successful. But at query time the join between the
> FK of the fact table and the PK of the Lookup table was failing due to
>
> "java.lang.NullPointerException: Column {LOOKUP.PK} does not exist in row
> key desc"
>
> I wondered if this has been solved in 0.7.2 so I upgraded (in particular I
> noticed this Jira https://issues.apache.org/jira/browse/KYLIN-831 which
> seemed relevant), tried again and saw improvement (PK seemed apparent this
> time) but my join still failed with the following error:
>
> "Can't find any realization."
>
> Anyway, I managed to find the reason of this problem. I was trying to do a
> left join. When I do a query with a join I get the expected result. I had
> tried to play with left join / inner join / right join condition in the
> cube definition before but couldn't make it work. The combination that made
> it work for me was to use an inner join in the cube definition and also
> when querying.
>
>
> On Mon, Aug 3, 2015 at 4:32 AM, hongbin ma <[email protected]> wrote:
>
> > hi alex,
> >
> > I'm not quite following this thread? It looks like except jason
> > recommending your the slideshare link, no one is replying you. To whom
> are
> > your communicating? (Is someone sending reply to you in person instead of
> > to the dev list?)
> >
> > Can you re-summarize your problem again and I might be able to help you.
> > I found you're confused by hierarchies and derived columns, here's some
> > take-aways, I'm recently summarizing them to formal docs:
> >
> > *Hierarchies:*
> >
> > Theoretically for N dimensions you'll end up with 2^N dimension
> > combinations. However for some group of dimensions there are no need to
> > create so many combinations. For example, if you have three dimensions:
> > continent, country, city (In hierarchies, the "bigger" dimension comes
> > first). You will only need the following three combinations of group by
> > when you do drill down analysis:
> >
> > group by continent
> > group by continent, country
> > group by continent, country, city
> >
> > In such cases the combination count is reduced from 2^3=8 to 3, which is
> a
> > great optimization. The same goes for the YEAR,QUATER,MONTH,DATE case.
> >
> > If we Donate the hierarchy dimension as H1,H2,H3, typical scenarios would
> > be:
> >
> > *A. Hierarchies on lookup table*
> >
> > Fact Table (joins) Lookup Table
> > =================== =============
> > column1,column2,,,,,, FK PK,,H1,H2,H3,,,,
> >
> > B. Hierarchies on fact table
> >
> > Fact Table
> > ===========================
> > column1,column2,,,H1,H2,H3,,,,,,,
> >
> > There is a special case for scenario A, where PK on the lookup table is
> > accidentally being part of the hierarchies. For example we have a
> calendar
> > lookup table where cal_dt is the primary key:
> >
> > *A*. Hierarchies on lookup table over its primary key*
> >
> > Lookup Table(Calendar)
> > ==============================================
> > cal_dt(PK), week_beg_dt, month_beg_dt, quarter_beg_dt,,,
> >
> > For cases like A* what you need is another optimization called "Derived
> > Columns"
> >
> > *Derived Columns:*
> >
> > Derived column is used when one or more dimensions (They must be
> dimension
> > on lookup table, these columns are called "Derived") can be deduced from
> > another(Usually it is the corresponding FK, this is called the "host
> > column")
> >
> > For example, suppose we have a lookup table where we join fact table and
> it
> > with "where DimA = DimX". Notice in Kylin, if you choose FK into a
> > dimension, the corresponding PK will be automatically querable, without
> any
> > extra cost. The secret is that since FK and PK are always identical,
> Kylin
> > can apply filters/groupby on the FK first, and transparently replace them
> > to PK. This indicates that if we want the DimA(FK), DimX(PK), DimB, DimC
> > in our cube, we can safely choose DimA,DimB,DimC only.
> >
> >
> > Fact Table (joins) Lookup
> > Table
> > ======================== =============
> > column1,column2,,,,,, DimA(FK) DimX(PK),,DimB,
> DimC
> >
> > Let's say that DimA(the dimension representing FK/PK) has a special
> mapping
> > to DimB:
> >
> > dimA dimB dimC
> > 1 a ?
> > 2 b ?
> > 3 c ?
> > 4 a ?
> >
> > in this case, given a value in DimA, the value of DimB is determined, so
> we
> > say dimB can be derived from DimA. When we build a cube that contains
> both
> > DimA and DimB, we simple include DimA, and marking DimB as derived.
> Derived
> > column(DimB) does not participant in cuboids generation:
> >
> > original combinations:
> > ABC,AB,AC,BC,A,B,C
> >
> > combinations when driving B from A:
> > AC,A,C
> >
> > at Runtime, in case queries like "select count(*) from fact_table inner
> > join looup1 group by looup1 .dimB", it is expecting cuboid containing
> DimB
> > to answer the query. However, DimB will appear in NONE of the cuboids due
> > to derived optimization. In this case, we modify the execution plan to
> make
> > it group by DimA(its host column) first, we'll get intermediate answer
> > like:
> >
> > DimA count(*)
> > 1 1
> > 2 1
> > 3 1
> > 4 1
> >
> > Afterwards, Kylin will replace DimA values with DimB values(since both of
> > their values are in lookup table, Kylin can load the whole lookup table
> > into memory and build a mapping for them), and the intermediate result
> > becomes:
> >
> > DimB count(*)
> > a 1
> > b 1
> > c 1
> > a 1
> >
> > After this, the runtime SQL engine(calcite) will further aggregate the
> > intermediate result to:
> >
> > DimB count(*)
> > a 2
> > b 1
> > c 1
> >
> > this step happens at query runtime, this is what it means "at the cost of
> > extra runtime aggregation"
> >
> >
> >
> >
> > On Sat, Aug 1, 2015 at 12:54 AM, alex schufo <[email protected]>
> wrote:
> >
> > > Sorry to be a bit annoying with the topic but I tried different cubes /
> > > hierarchies and can never join.
> > >
> > > Without this basically I cannot use Kylin on PROD for my project.
> > >
> > > The stack trace:
> > >
> > > http-bio-7070-exec-3]:[2015-07-31
> > >
> > >
> >
> 09:42:06,337][ERROR][org.apache.kylin.rest.controller.BasicController.handleError(BasicController.java:52)]
> > > -
> > >
> > > org.apache.kylin.rest.exception.InternalErrorException: Can't find any
> > > realization. Please confirm with providers. SQL digest: fact table
> > > DEFAULT.SAMPLE_DIM,group by [DEFAULT.SAMPLE_DIM.ID],filter on [],with
> > > aggregates[].
> > >
> > > while executing SQL: "select id from sample_dim group by id LIMIT
> 50000"
> > >
> > > at
> > >
> > >
> >
> org.apache.kylin.rest.controller.QueryController.doQueryInternal(QueryController.java:223)
> > >
> > > at
> > >
> > >
> >
> org.apache.kylin.rest.controller.QueryController.doQuery(QueryController.java:174)
> > >
> > > at
> > >
> > >
> >
> org.apache.kylin.rest.controller.QueryController.query(QueryController.java:91)
> > >
> > > at
> > >
> > >
> >
> org.apache.kylin.rest.controller.QueryController$$FastClassByCGLIB$$fc039d0b.invoke(<generated>)
> > >
> > > at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
> > >
> > > at
> > >
> > >
> >
> org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:689)
> > >
> > > at
> > >
> > >
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
> > >
> > > at
> > >
> > >
> >
> com.ryantenney.metrics.spring.TimedMethodInterceptor.invoke(TimedMethodInterceptor.java:48)
> > >
> > > at
> > >
> > >
> >
> com.ryantenney.metrics.spring.TimedMethodInterceptor.invoke(TimedMethodInterceptor.java:34)
> > >
> > > at
> > >
> > >
> >
> com.ryantenney.metrics.spring.AbstractMetricMethodInterceptor.invoke(AbstractMetricMethodInterceptor.java:59)
> > >
> > > at
> > >
> > >
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
> > >
> > > at
> > >
> > >
> >
> org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:622)
> > >
> > > at
> > >
> > >
> >
> org.apache.kylin.rest.controller.QueryController$$EnhancerByCGLIB$$5b607924.query(<generated>)
> > >
> > > at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown
> Source)
> > >
> > > at
> > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >
> > > at java.lang.reflect.Method.invoke(Method.java:606)
> > >
> > > at
> > >
> > >
> >
> org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:213)
> > >
> > > at
> > >
> > >
> >
> org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126)
> > >
> > > at
> > >
> > >
> >
> org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96)
> > >
> > > at
> > >
> > >
> >
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617)
> > >
> > > at
> > >
> > >
> >
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578)
> > >
> > > at
> > >
> > >
> >
> org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80)
> > >
> > > at
> > >
> > >
> >
> org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
> > >
> > > at
> > >
> > >
> >
> org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
> > >
> > > at
> > >
> > >
> >
> org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
> > >
> > > at
> > >
> > >
> >
> org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
> > >
> > > at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
> > >
> > > at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
> > >
> > > at
> > >
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
> > >
> > > at
> > >
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> > >
> > > at
> > > org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
> > >
> > > at
> > >
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> > >
> > > at
> > >
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> > >
> > > at
> > >
> > >
> >
> com.codahale.metrics.servlet.AbstractInstrumentedFilter.doFilter(AbstractInstrumentedFilter.java:97)
> > >
> > > at
> > >
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> > >
> > > at
> > >
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:103)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:45)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.authentication.www.BasicAuthenticationFilter.doFilter(BasicAuthenticationFilter.java:150)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.authentication.ui.DefaultLoginPageGeneratingFilter.doFilter(DefaultLoginPageGeneratingFilter.java:91)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:183)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:105)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:87)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:192)
> > >
> > > at
> > >
> > >
> >
> org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:160)
> > >
> > > at
> > >
> > >
> >
> org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:346)
> > >
> > > at
> > >
> > >
> >
> org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:259)
> > >
> > > at
> > >
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> > >
> > > at
> > >
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> > >
> > > at
> > >
> > >
> >
> org.apache.kylin.rest.filter.KylinApiFilter.doFilterInternal(KylinApiFilter.java:64)
> > >
> > > at
> > >
> > >
> >
> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76)
> > >
> > > at
> > >
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> > >
> > > at
> > >
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> > >
> > > at
> > > com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:195)
> > >
> > > at
> > > com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:266)
> > >
> > > at
> > >
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> > >
> > > at
> > >
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> > >
> > > at
> > >
> > >
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
> > >
> > > at
> > >
> > >
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
> > >
> > > at
> > >
> > >
> >
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:504)
> > >
> > > at
> > >
> > >
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
> > >
> > > at
> > >
> > >
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
> > >
> > > at
> > >
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
> > >
> > > at
> > >
> > >
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
> > >
> > > at
> > >
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:421)
> > >
> > > at
> > >
> > >
> >
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1074)
> > >
> > > at
> > >
> > >
> >
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
> > >
> > > at
> > >
> > >
> >
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
> > >
> > > at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >
> > > at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >
> > > at
> > >
> > >
> >
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
> > >
> > > at java.lang.Thread.run(Thread.java:744)
> > >
> > >
> > > The result for "select * from sample_dim"
> > >
> > > ID,DIM1,DIM2
> > >
> > > 33814,NYC,USA
> > >
> > > 201431,PARIS,FRANCE
> > >
> > > etc.
> > >
> > >
> > >
> > > On Wed, Jul 29, 2015 at 3:37 PM, alex schufo <[email protected]>
> > wrote:
> > >
> > > > So with 0.7.2 the cube builds, and I can see some improvement:
> > > >
> > > > "select * from SAMPLE_DIM" now returns all the fields, i.e:
> > > >
> > > > dim1, dim2, dim3, etc., SAMPLE_ID
> > > >
> > > > and I can see all the values for each field.
> > > >
> > > > However the join between the fact table and the lookup table still
> does
> > > > not work, it returns:
> > > >
> > > > Can't find any realization.
> > > >
> > > > And if I do "select SAMPLE_ID from SAMPLE_DIM group by SAMPLE_ID" it
> > also
> > > > returns:
> > > >
> > > > Can't find any realization.
> > > >
> > > > If I do "select SAMPLE_ID from FACT_TABLE group by SAMPLE_ID" then I
> > get
> > > > the list of all SAMPLE_ID as expected.
> > > >
> > > > If I do "select dim1 from SAMPLE_DIM group by dim1" I also get the
> list
> > > of
> > > > all dim1 as expected.
> > > >
> > > > The same exact query works perfectly on Hive (although it takes a
> long
> > > > time to be processed of course).
> > > >
> > > > Am I doing something wrong?
> > > >
> > > > On Wed, Jul 29, 2015 at 1:35 PM, alex schufo <[email protected]>
> > > wrote:
> > > >
> > > >> Ok I guess this is https://issues.apache.org/jira/browse/KYLIN-831,
> > > >> right?
> > > >>
> > > >> I upgraded today to 0.7.2 and hope it solves the problem then.
> > > >>
> > > >> Regards
> > > >>
> > > >> On Tue, Jul 28, 2015 at 5:52 PM, alex schufo <[email protected]>
> > > >> wrote:
> > > >>
> > > >>> I still don't understand this.
> > > >>>
> > > >>> I have a simple fact table and a simple SAMPLE_DIM lookup table.
> They
> > > >>> are joined on SAMPLE_ID.
> > > >>>
> > > >>> If I do like you say and include all the columns of SAMPLE_DIM as a
> > > >>> hierarchy and do not include the SAMPLE_ID then the cube builds
> > > >>> successfully but I cannot query with the hierarchy. Any join
> results
> > in
> > > >>> this error:
> > > >>>
> > > >>> Column 'SAMPLE_ID' not found in table 'SAMPLE_DIM'
> > > >>>
> > > >>> Indeed if I do a select * from 'SAMPLE_DIM' I can see all the
> > hierarchy
> > > >>> but not the SAMPLE_ID used to join with the fact table.
> > > >>>
> > > >>> If I include the SAMPLE_ID in the hierarchy definition then the
> cube
> > > >>> build fails on step 3 with:
> > > >>>
> > > >>> java.lang.NullPointerException: Column DEFAULT.FACT_TABLE.SAMPLE_ID
> > > does
> > > >>> not exist in row key desc
> > > >>> at
> > > org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158)
> > > >>> at
> > > >>>
> > >
> org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152)
> > > >>> at
> > > >>>
> > >
> >
> org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:163)
> > > >>> at
> > > >>>
> > >
> >
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:51)
> > > >>> at
> > > >>>
> > >
> >
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
> > > >>> at
> > > >>>
> > >
> >
> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
> > > >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > > >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > > >>> at
> > > >>>
> > >
> >
> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
> > > >>> at
> > > >>>
> > >
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> > > >>> at
> > > >>>
> > >
> >
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> > > >>> at
> > > >>>
> > >
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> > > >>> at
> > > >>>
> > >
> >
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
> > > >>> at
> > > >>>
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > >>> at
> > > >>>
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > >>> at java.lang.Thread.run(Thread.java:744)
> > > >>>
> > > >>> (the SAMPLE_ID *does* exist in the FACT_TABLE)
> > > >>>
> > > >>> The only scenario I could make it work is when I also create a
> > derived
> > > >>> dimension SAMPLE_ID / something else, then somehow the SAMPLE_ID is
> > > >>> included and can be queried.
> > > >>>
> > > >>> Any help with that?
> > > >>>
> > > >>>
> > > >>> On Fri, Jun 19, 2015 at 1:37 PM, alex schufo <[email protected]
> >
> > > >>> wrote:
> > > >>>
> > > >>>> Thanks for the answer,
> > > >>>>
> > > >>>> Indeed I had a look at these slides before and it's great to
> > > understand
> > > >>>> the high level concepts but I ended up spending quite some time
> when
> > > >>>> designing my dimensions with the issues mentioned below.
> > > >>>>
> > > >>>> On Fri, Jun 19, 2015 at 11:23 AM, jason zhong <
> > [email protected]
> > > >
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Hi Alex,
> > > >>>>>
> > > >>>>> We have a slide to hlep you understand how to build cube.I don't
> > know
> > > >>>>> whether you have read this? This will hlep you understand derived
> > and
> > > >>>>> hierarchy.
> > > >>>>>
> > > >>>>> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
> > > >>>>>
> > > >>>>> for your case about hierarchy,log_date should not be included in
> > > >>>>> hierarchy
> > > >>>>> ,here's a bug you help find it.we will follow this.
> > > >>>>>
> > > >>>>> also .more document and UI enhancement will be done to help user
> > > build
> > > >>>>> cube
> > > >>>>> easily.
> > > >>>>>
> > > >>>>> Thanks!!
> > > >>>>>
> > > >>>>> On Fri, Jun 12, 2015 at 5:07 PM, alex schufo <
> [email protected]
> > >
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>> > I am trying to create a simple cube with a fact table and 3
> > > >>>>> dimensions.
> > > >>>>> >
> > > >>>>> > I have read the different slideshares and wiki pages, but I
> found
> > > >>>>> that the
> > > >>>>> > documentation is not very specific on how to manage
> hierarchies.
> > > >>>>> >
> > > >>>>> > Let's take this simple example :
> > > >>>>> >
> > > >>>>> > Fact table: productID, storeID, logDate, numbOfSell, etc.
> > > >>>>> >
> > > >>>>> > Date lookup table : logDate, week, month, quarter, etc.
> > > >>>>> >
> > > >>>>> > I specified Left join on logDate, actually when I specify this
> I
> > > >>>>> find it
> > > >>>>> > not very clear which one is considered to be the Left table and
> > > >>>>> which one
> > > >>>>> > is considered to be the Right table. I assumed the Fact table
> was
> > > >>>>> the left
> > > >>>>> > table and the Lookup table the right table, looking at it now I
> > > >>>>> think that
> > > >>>>> > might be a mistake (I am just interested in dates for which
> there
> > > are
> > > >>>>> > results in the fact table).
> > > >>>>> >
> > > >>>>> > If I use the auto generator it creates a derived dimension, I
> > don't
> > > >>>>> think
> > > >>>>> > that's what I need.
> > > >>>>> >
> > > >>>>> > So I created a hierarchy, but again to me it's clearly
> indicated
> > > if I
> > > >>>>> > should create ["quarter", "month", "week", "log_date"] or
> > > ["logDate",
> > > >>>>> > "week", "month", "quarter"]?
> > > >>>>> >
> > > >>>>> > Also should I include log_date in the hierarchy? To me it was
> > more
> > > >>>>> > intuitive not to include it because it's already the join, but
> it
> > > >>>>> created
> > > >>>>> > the cube without it and I cannot query by date, it says that
> > > >>>>> "log_date" is
> > > >>>>> > not found in the date table (it is in the Hive table but not
> the
> > > cube
> > > >>>>> > built). If I include it in the hierarchy the cube build fails
> > with
> > > >>>>> this
> > > >>>>> > error :
> > > >>>>> >
> > > >>>>> > java.lang.NullPointerException: Column
> > DEFAULT.DATE_TABLE.LOG_DATE
> > > >>>>> > does not exist in row key desc
> > > >>>>> > at
> > > >>>>> >
> > > >>>>>
> > > org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158)
> > > >>>>> > at
> > > >>>>> >
> > > >>>>>
> > >
> org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152)
> > > >>>>> > at
> > > >>>>> >
> > > >>>>>
> > >
> >
> org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:163)
> > > >>>>> > at
> > > >>>>> >
> > > >>>>>
> > >
> >
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:51)
> > > >>>>> > at
> > > >>>>> >
> > > >>>>>
> > >
> >
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
> > > >>>>> > at
> > > >>>>> >
> > > >>>>>
> > >
> >
> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
> > > >>>>> > at
> > > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > > >>>>> > at
> > > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > > >>>>> > at
> > > >>>>> >
> > > >>>>>
> > >
> >
> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
> > > >>>>> > at
> > > >>>>> >
> > > >>>>>
> > >
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> > > >>>>> > at
> > > >>>>> >
> > > >>>>>
> > >
> >
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> > > >>>>> > at
> > > >>>>> >
> > > >>>>>
> > >
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> > > >>>>> > at
> > > >>>>> >
> > > >>>>>
> > >
> >
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
> > > >>>>> > at
> > > >>>>> >
> > > >>>>>
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > >>>>> > at
> > > >>>>> >
> > > >>>>>
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > >>>>> > at java.lang.Thread.run(Thread.java:744)
> > > >>>>> >
> > > >>>>> > result code:2
> > > >>>>> >
> > > >>>>> >
> > > >>>>> > I think it might be useful to improve the documentation to
> > explain
> > > >>>>> this
> > > >>>>> > more clearly and not just the basic steps because building a
> cube
> > > >>>>> even on
> > > >>>>> > short time ranges takes some time so learning by trial / error
> is
> > > >>>>> very time
> > > >>>>> > consuming.
> > > >>>>> >
> > > >>>>> > Same thing for the derived dimensions, should I include
> > ["storeID",
> > > >>>>> > "storeName"] or just ["storeName"]? The second option seems to
> > work
> > > >>>>> for me.
> > > >>>>> >
> > > >>>>> > Thanks
> > > >>>>> >
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >>
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>