There's one internal page to introduce all metadata, including terms, links, and samples.
and, why not you draft one blog for this? ;-) Thanks. Best Regards! --------------------- Luke Han On Mon, Aug 3, 2015 at 8:15 PM, hongbin ma <[email protected]> wrote: > Terminology page might be a good entry for these two terms, but I think the > details should belong to another page. Do we have a doc section to > introduce kylin internals, or something like "optimizing your cubes" ? > > On Mon, Aug 3, 2015 at 8:06 PM, Luke Han <[email protected]> wrote: > > > Terminology page is right place designed host such terms: > > > > http://kylin.incubator.apache.org/docs/gettingstarted/terminology.html > > <http://kylin.incubator.apache.org/docs/gettingstarted/terminology.html> > > > > Thanks. > > > > > > Best Regards! > > --------------------- > > > > Luke Han > > > > On Mon, Aug 3, 2015 at 2:58 PM, hongbin ma <[email protected]> wrote: > > > > > Hi Maria > > > > > > we do not have a formal doc for this now. > > > I was planning to refine the materials here in this thread, and publish > > it > > > as a formal document if the materials here answers your concerns. > > > > > > What are you expecting from the "formal doc"? > > > > > > On Mon, Aug 3, 2015 at 2:37 PM, Gaspare Maria < > > > [email protected]> wrote: > > > > > > > Hi, > > > > > > > > Good explanation. > > > > > > > > Could you share the link to the formal doc you *Hierarchies:* ? > > > > > > > > A complete example shown this details will also help. > > > > > > > > Thanks. > > > > > > > > Gaspare Maria > > > > > > > > <div>-------- Messaggio originale --------</div><div>Da: hongbin ma < > > > > [email protected]> </div><div>Data:03/08/2015 05:32 (GMT+01:00) > > > > </div><div>A: dev <[email protected]> > </div><div>Oggetto: > > > > Re: Modeling hierarchies </div><div> > > > > </div>hi alex, > > > > > > > > I'm not quite following this thread? It looks like except jason > > > > recommending your the slideshare link, no one is replying you. To > whom > > > are > > > > your communicating? (Is someone sending reply to you in person > instead > > of > > > > to the dev list?) > > > > > > > > Can you re-summarize your problem again and I might be able to help > > you. > > > > I found you're confused by hierarchies and derived columns, here's > some > > > > take-aways, I'm recently summarizing them to formal docs: > > > > > > > > *Hierarchies:* > > > > > > > > Theoretically for N dimensions you'll end up with 2^N dimension > > > > combinations. However for some group of dimensions there are no need > to > > > > create so many combinations. For example, if you have three > dimensions: > > > > continent, country, city (In hierarchies, the "bigger" dimension > comes > > > > first). You will only need the following three combinations of group > by > > > > when you do drill down analysis: > > > > > > > > group by continent > > > > group by continent, country > > > > group by continent, country, city > > > > > > > > In such cases the combination count is reduced from 2^3=8 to 3, which > > is > > > a > > > > great optimization. The same goes for the YEAR,QUATER,MONTH,DATE > case. > > > > > > > > If we Donate the hierarchy dimension as H1,H2,H3, typical scenarios > > would > > > > be: > > > > > > > > *A. Hierarchies on lookup table* > > > > > > > > Fact Table (joins) Lookup Table > > > > =================== ============= > > > > column1,column2,,,,,, FK PK,,H1,H2,H3,,,, > > > > > > > > B. Hierarchies on fact table > > > > > > > > Fact Table > > > > =========================== > > > > column1,column2,,,H1,H2,H3,,,,,,, > > > > > > > > There is a special case for scenario A, where PK on the lookup table > is > > > > accidentally being part of the hierarchies. For example we have a > > > calendar > > > > lookup table where cal_dt is the primary key: > > > > > > > > *A*. Hierarchies on lookup table over its primary key* > > > > > > > > Lookup Table(Calendar) > > > > ============================================== > > > > cal_dt(PK), week_beg_dt, month_beg_dt, quarter_beg_dt,,, > > > > > > > > For cases like A* what you need is another optimization called > "Derived > > > > Columns" > > > > > > > > *Derived Columns:* > > > > > > > > Derived column is used when one or more dimensions (They must be > > > dimension > > > > on lookup table, these columns are called "Derived") can be deduced > > from > > > > another(Usually it is the corresponding FK, this is called the "host > > > > column") > > > > > > > > For example, suppose we have a lookup table where we join fact table > > and > > > it > > > > with "where DimA = DimX". Notice in Kylin, if you choose FK into a > > > > dimension, the corresponding PK will be automatically querable, > without > > > any > > > > extra cost. The secret is that since FK and PK are always identical, > > > Kylin > > > > can apply filters/groupby on the FK first, and transparently replace > > them > > > > to PK. This indicates that if we want the DimA(FK), DimX(PK), DimB, > > DimC > > > > in our cube, we can safely choose DimA,DimB,DimC only. > > > > > > > > > > > > Fact Table (joins) > > Lookup > > > > Table > > > > ======================== ============= > > > > column1,column2,,,,,, DimA(FK) > DimX(PK),,DimB, > > > DimC > > > > > > > > Let's say that DimA(the dimension representing FK/PK) has a special > > > mapping > > > > to DimB: > > > > > > > > dimA dimB dimC > > > > 1 a ? > > > > 2 b ? > > > > 3 c ? > > > > 4 a ? > > > > > > > > in this case, given a value in DimA, the value of DimB is determined, > > so > > > we > > > > say dimB can be derived from DimA. When we build a cube that contains > > > both > > > > DimA and DimB, we simple include DimA, and marking DimB as derived. > > > Derived > > > > column(DimB) does not participant in cuboids generation: > > > > > > > > original combinations: > > > > ABC,AB,AC,BC,A,B,C > > > > > > > > combinations when driving B from A: > > > > AC,A,C > > > > > > > > at Runtime, in case queries like "select count(*) from fact_table > inner > > > > join looup1 group by looup1 .dimB", it is expecting cuboid containing > > > DimB > > > > to answer the query. However, DimB will appear in NONE of the cuboids > > due > > > > to derived optimization. In this case, we modify the execution plan > to > > > make > > > > it group by DimA(its host column) first, we'll get intermediate > answer > > > > like: > > > > > > > > DimA count(*) > > > > 1 1 > > > > 2 1 > > > > 3 1 > > > > 4 1 > > > > > > > > Afterwards, Kylin will replace DimA values with DimB values(since > both > > of > > > > their values are in lookup table, Kylin can load the whole lookup > table > > > > into memory and build a mapping for them), and the intermediate > result > > > > becomes: > > > > > > > > DimB count(*) > > > > a 1 > > > > b 1 > > > > c 1 > > > > a 1 > > > > > > > > After this, the runtime SQL engine(calcite) will further aggregate > the > > > > intermediate result to: > > > > > > > > DimB count(*) > > > > a 2 > > > > b 1 > > > > c 1 > > > > > > > > this step happens at query runtime, this is what it means "at the > cost > > of > > > > extra runtime aggregation" > > > > > > > > > > > > > > > > > > > > On Sat, Aug 1, 2015 at 12:54 AM, alex schufo <[email protected]> > > > wrote: > > > > > > > > > Sorry to be a bit annoying with the topic but I tried different > > cubes / > > > > > hierarchies and can never join. > > > > > > > > > > Without this basically I cannot use Kylin on PROD for my project. > > > > > > > > > > The stack trace: > > > > > > > > > > http-bio-7070-exec-3]:[2015-07-31 > > > > > > > > > > > > > > > > > > > > 09:42:06,337][ERROR][org.apache.kylin.rest.controller.BasicController.handleError(BasicController.java:52)] > > > > > - > > > > > > > > > > org.apache.kylin.rest.exception.InternalErrorException: Can't find > > any > > > > > realization. Please confirm with providers. SQL digest: fact table > > > > > DEFAULT.SAMPLE_DIM,group by [DEFAULT.SAMPLE_DIM.ID],filter on > > [],with > > > > > aggregates[]. > > > > > > > > > > while executing SQL: "select id from sample_dim group by id LIMIT > > > 50000" > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.kylin.rest.controller.QueryController.doQueryInternal(QueryController.java:223) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.kylin.rest.controller.QueryController.doQuery(QueryController.java:174) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.kylin.rest.controller.QueryController.query(QueryController.java:91) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.kylin.rest.controller.QueryController$$FastClassByCGLIB$$fc039d0b.invoke(<generated>) > > > > > > > > > > at > > net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:689) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > com.ryantenney.metrics.spring.TimedMethodInterceptor.invoke(TimedMethodInterceptor.java:48) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > com.ryantenney.metrics.spring.TimedMethodInterceptor.invoke(TimedMethodInterceptor.java:34) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > com.ryantenney.metrics.spring.AbstractMetricMethodInterceptor.invoke(AbstractMetricMethodInterceptor.java:59) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:622) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.kylin.rest.controller.QueryController$$EnhancerByCGLIB$$5b607924.query(<generated>) > > > > > > > > > > at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown > > > Source) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > > > > > > > > > at java.lang.reflect.Method.invoke(Method.java:606) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:213) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789) > > > > > > > > > > at > > javax.servlet.http.HttpServlet.service(HttpServlet.java:646) > > > > > > > > > > at > > javax.servlet.http.HttpServlet.service(HttpServlet.java:727) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) > > > > > > > > > > at > > > > > > > org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > com.codahale.metrics.servlet.AbstractInstrumentedFilter.doFilter(AbstractInstrumentedFilter.java:97) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:103) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:45) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.authentication.www.BasicAuthenticationFilter.doFilter(BasicAuthenticationFilter.java:150) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.authentication.ui.DefaultLoginPageGeneratingFilter.doFilter(DefaultLoginPageGeneratingFilter.java:91) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:183) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:105) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:87) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:192) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:160) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:346) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:259) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.kylin.rest.filter.KylinApiFilter.doFilterInternal(KylinApiFilter.java:64) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) > > > > > > > > > > at > > > > > > > com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:195) > > > > > > > > > > at > > > > > > > com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:266) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:504) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) > > > > > > > > > > at > > > > > > > > > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) > > > > > > > > > > at > > > > > > > > > > > > > > > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:421) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1074) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) > > > > > > > > > > at java.lang.Thread.run(Thread.java:744) > > > > > > > > > > > > > > > The result for "select * from sample_dim" > > > > > > > > > > ID,DIM1,DIM2 > > > > > > > > > > 33814,NYC,USA > > > > > > > > > > 201431,PARIS,FRANCE > > > > > > > > > > etc. > > > > > > > > > > > > > > > > > > > > On Wed, Jul 29, 2015 at 3:37 PM, alex schufo <[email protected] > > > > > > wrote: > > > > > > > > > > > So with 0.7.2 the cube builds, and I can see some improvement: > > > > > > > > > > > > "select * from SAMPLE_DIM" now returns all the fields, i.e: > > > > > > > > > > > > dim1, dim2, dim3, etc., SAMPLE_ID > > > > > > > > > > > > and I can see all the values for each field. > > > > > > > > > > > > However the join between the fact table and the lookup table > still > > > does > > > > > > not work, it returns: > > > > > > > > > > > > Can't find any realization. > > > > > > > > > > > > And if I do "select SAMPLE_ID from SAMPLE_DIM group by SAMPLE_ID" > > it > > > > also > > > > > > returns: > > > > > > > > > > > > Can't find any realization. > > > > > > > > > > > > If I do "select SAMPLE_ID from FACT_TABLE group by SAMPLE_ID" > then > > I > > > > get > > > > > > the list of all SAMPLE_ID as expected. > > > > > > > > > > > > If I do "select dim1 from SAMPLE_DIM group by dim1" I also get > the > > > list > > > > > of > > > > > > all dim1 as expected. > > > > > > > > > > > > The same exact query works perfectly on Hive (although it takes a > > > long > > > > > > time to be processed of course). > > > > > > > > > > > > Am I doing something wrong? > > > > > > > > > > > > On Wed, Jul 29, 2015 at 1:35 PM, alex schufo < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > >> Ok I guess this is > > https://issues.apache.org/jira/browse/KYLIN-831, > > > > > >> right? > > > > > >> > > > > > >> I upgraded today to 0.7.2 and hope it solves the problem then. > > > > > >> > > > > > >> Regards > > > > > >> > > > > > >> On Tue, Jul 28, 2015 at 5:52 PM, alex schufo < > > [email protected]> > > > > > >> wrote: > > > > > >> > > > > > >>> I still don't understand this. > > > > > >>> > > > > > >>> I have a simple fact table and a simple SAMPLE_DIM lookup > table. > > > They > > > > > >>> are joined on SAMPLE_ID. > > > > > >>> > > > > > >>> If I do like you say and include all the columns of SAMPLE_DIM > > as a > > > > > >>> hierarchy and do not include the SAMPLE_ID then the cube builds > > > > > >>> successfully but I cannot query with the hierarchy. Any join > > > results > > > > in > > > > > >>> this error: > > > > > >>> > > > > > >>> Column 'SAMPLE_ID' not found in table 'SAMPLE_DIM' > > > > > >>> > > > > > >>> Indeed if I do a select * from 'SAMPLE_DIM' I can see all the > > > > hierarchy > > > > > >>> but not the SAMPLE_ID used to join with the fact table. > > > > > >>> > > > > > >>> If I include the SAMPLE_ID in the hierarchy definition then the > > > cube > > > > > >>> build fails on step 3 with: > > > > > >>> > > > > > >>> java.lang.NullPointerException: Column > > DEFAULT.FACT_TABLE.SAMPLE_ID > > > > > does > > > > > >>> not exist in row key desc > > > > > >>> at > > > > > > > org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158) > > > > > >>> at > > > > > >>> > > > > > > > > > org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152) > > > > > >>> at > > > > > >>> > > > > > > > > > > > > > > > org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:163) > > > > > >>> at > > > > > >>> > > > > > > > > > > > > > > > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:51) > > > > > >>> at > > > > > >>> > > > > > > > > > > > > > > > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > > > > > >>> at > > > > > >>> > > > > > > > > > > > > > > > org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53) > > > > > >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > > > > > >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > > > > > >>> at > > > > > >>> > > > > > > > > > > > > > > > org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63) > > > > > >>> at > > > > > >>> > > > > > > > > > > > > > > > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) > > > > > >>> at > > > > > >>> > > > > > > > > > > > > > > > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > > > > > >>> at > > > > > >>> > > > > > > > > > > > > > > > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) > > > > > >>> at > > > > > >>> > > > > > > > > > > > > > > > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132) > > > > > >>> at > > > > > >>> > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > > > > >>> at > > > > > >>> > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > > > > >>> at java.lang.Thread.run(Thread.java:744) > > > > > >>> > > > > > >>> (the SAMPLE_ID *does* exist in the FACT_TABLE) > > > > > >>> > > > > > >>> The only scenario I could make it work is when I also create a > > > > derived > > > > > >>> dimension SAMPLE_ID / something else, then somehow the > SAMPLE_ID > > is > > > > > >>> included and can be queried. > > > > > >>> > > > > > >>> Any help with that? > > > > > >>> > > > > > >>> > > > > > >>> On Fri, Jun 19, 2015 at 1:37 PM, alex schufo < > > [email protected] > > > > > > > > > >>> wrote: > > > > > >>> > > > > > >>>> Thanks for the answer, > > > > > >>>> > > > > > >>>> Indeed I had a look at these slides before and it's great to > > > > > understand > > > > > >>>> the high level concepts but I ended up spending quite some > time > > > when > > > > > >>>> designing my dimensions with the issues mentioned below. > > > > > >>>> > > > > > >>>> On Fri, Jun 19, 2015 at 11:23 AM, jason zhong < > > > > [email protected] > > > > > > > > > > > >>>> wrote: > > > > > >>>> > > > > > >>>>> Hi Alex, > > > > > >>>>> > > > > > >>>>> We have a slide to hlep you understand how to build cube.I > > don't > > > > know > > > > > >>>>> whether you have read this? This will hlep you understand > > derived > > > > and > > > > > >>>>> hierarchy. > > > > > >>>>> > > > > > >>>>> > http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin > > > > > >>>>> > > > > > >>>>> for your case about hierarchy,log_date should not be included > > in > > > > > >>>>> hierarchy > > > > > >>>>> ,here's a bug you help find it.we will follow this. > > > > > >>>>> > > > > > >>>>> also .more document and UI enhancement will be done to help > > user > > > > > build > > > > > >>>>> cube > > > > > >>>>> easily. > > > > > >>>>> > > > > > >>>>> Thanks!! > > > > > >>>>> > > > > > >>>>> On Fri, Jun 12, 2015 at 5:07 PM, alex schufo < > > > [email protected] > > > > > > > > > > >>>>> wrote: > > > > > >>>>> > > > > > >>>>> > I am trying to create a simple cube with a fact table and 3 > > > > > >>>>> dimensions. > > > > > >>>>> > > > > > > >>>>> > I have read the different slideshares and wiki pages, but I > > > found > > > > > >>>>> that the > > > > > >>>>> > documentation is not very specific on how to manage > > > hierarchies. > > > > > >>>>> > > > > > > >>>>> > Let's take this simple example : > > > > > >>>>> > > > > > > >>>>> > Fact table: productID, storeID, logDate, numbOfSell, etc. > > > > > >>>>> > > > > > > >>>>> > Date lookup table : logDate, week, month, quarter, etc. > > > > > >>>>> > > > > > > >>>>> > I specified Left join on logDate, actually when I specify > > this > > > I > > > > > >>>>> find it > > > > > >>>>> > not very clear which one is considered to be the Left table > > and > > > > > >>>>> which one > > > > > >>>>> > is considered to be the Right table. I assumed the Fact > table > > > was > > > > > >>>>> the left > > > > > >>>>> > table and the Lookup table the right table, looking at it > > now I > > > > > >>>>> think that > > > > > >>>>> > might be a mistake (I am just interested in dates for which > > > there > > > > > are > > > > > >>>>> > results in the fact table). > > > > > >>>>> > > > > > > >>>>> > If I use the auto generator it creates a derived > dimension, I > > > > don't > > > > > >>>>> think > > > > > >>>>> > that's what I need. > > > > > >>>>> > > > > > > >>>>> > So I created a hierarchy, but again to me it's clearly > > > indicated > > > > > if I > > > > > >>>>> > should create ["quarter", "month", "week", "log_date"] or > > > > > ["logDate", > > > > > >>>>> > "week", "month", "quarter"]? > > > > > >>>>> > > > > > > >>>>> > Also should I include log_date in the hierarchy? To me it > was > > > > more > > > > > >>>>> > intuitive not to include it because it's already the join, > > but > > > it > > > > > >>>>> created > > > > > >>>>> > the cube without it and I cannot query by date, it says > that > > > > > >>>>> "log_date" is > > > > > >>>>> > not found in the date table (it is in the Hive table but > not > > > the > > > > > cube > > > > > >>>>> > built). If I include it in the hierarchy the cube build > fails > > > > with > > > > > >>>>> this > > > > > >>>>> > error : > > > > > >>>>> > > > > > > >>>>> > java.lang.NullPointerException: Column > > > > DEFAULT.DATE_TABLE.LOG_DATE > > > > > >>>>> > does not exist in row key desc > > > > > >>>>> > at > > > > > >>>>> > > > > > > >>>>> > > > > > > > org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158) > > > > > >>>>> > at > > > > > >>>>> > > > > > > >>>>> > > > > > > > > > org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152) > > > > > >>>>> > at > > > > > >>>>> > > > > > > >>>>> > > > > > > > > > > > > > > > org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:163) > > > > > >>>>> > at > > > > > >>>>> > > > > > > >>>>> > > > > > > > > > > > > > > > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:51) > > > > > >>>>> > at > > > > > >>>>> > > > > > > >>>>> > > > > > > > > > > > > > > > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > > > > > >>>>> > at > > > > > >>>>> > > > > > > >>>>> > > > > > > > > > > > > > > > org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53) > > > > > >>>>> > at > > > > > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > > > > > >>>>> > at > > > > > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > > > > > >>>>> > at > > > > > >>>>> > > > > > > >>>>> > > > > > > > > > > > > > > > org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63) > > > > > >>>>> > at > > > > > >>>>> > > > > > > >>>>> > > > > > > > > > > > > > > > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) > > > > > >>>>> > at > > > > > >>>>> > > > > > > >>>>> > > > > > > > > > > > > > > > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > > > > > >>>>> > at > > > > > >>>>> > > > > > > >>>>> > > > > > > > > > > > > > > > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) > > > > > >>>>> > at > > > > > >>>>> > > > > > > >>>>> > > > > > > > > > > > > > > > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132) > > > > > >>>>> > at > > > > > >>>>> > > > > > > >>>>> > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > > > > >>>>> > at > > > > > >>>>> > > > > > > >>>>> > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > > > > >>>>> > at java.lang.Thread.run(Thread.java:744) > > > > > >>>>> > > > > > > >>>>> > result code:2 > > > > > >>>>> > > > > > > >>>>> > > > > > > >>>>> > I think it might be useful to improve the documentation to > > > > explain > > > > > >>>>> this > > > > > >>>>> > more clearly and not just the basic steps because building > a > > > cube > > > > > >>>>> even on > > > > > >>>>> > short time ranges takes some time so learning by trial / > > error > > > is > > > > > >>>>> very time > > > > > >>>>> > consuming. > > > > > >>>>> > > > > > > >>>>> > Same thing for the derived dimensions, should I include > > > > ["storeID", > > > > > >>>>> > "storeName"] or just ["storeName"]? The second option seems > > to > > > > work > > > > > >>>>> for me. > > > > > >>>>> > > > > > > >>>>> > Thanks > > > > > >>>>> > > > > > > >>>>> > > > > > >>>> > > > > > >>>> > > > > > >>> > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Regards, > > > > > > > > *Bin Mahone | 马洪宾* > > > > Apache Kylin: http://kylin.io > > > > Github: https://github.com/binmahone > > > > > > > > > > > > > > > > -- > > > Regards, > > > > > > *Bin Mahone | 马洪宾* > > > Apache Kylin: http://kylin.io > > > Github: https://github.com/binmahone > > > > > > > > > -- > Regards, > > *Bin Mahone | 马洪宾* > Apache Kylin: http://kylin.io > Github: https://github.com/binmahone >
