Hongbin,

Nice explanation. Would make a great blog article.

Regards
Seshu

On 8/2/15, 8:32 PM, "hongbin ma" <[email protected]> wrote:

>hi alex,
>
>I'm not quite following this thread? It looks like except jason
>recommending your the slideshare link, no one is replying you. To whom are
>your communicating? (Is someone sending reply to you in person instead of
>to the dev list?)
>
>Can you re-summarize your problem again and I might be able to help you.
>I found you're confused by hierarchies and derived columns, here's some
>take-aways, I'm recently summarizing them to formal docs:
>
>*Hierarchies:*
>
>Theoretically for N dimensions you'll end up with 2^N dimension
>combinations. However for some group of dimensions there are no need to
>create so many combinations. For example, if you have three dimensions:
>continent, country, city (In hierarchies, the "bigger" dimension comes
>first). You will only need the following three combinations of group by
>when you do drill down analysis:
>
>group by continent
>group by continent, country
>group by continent, country, city
>
>In such cases the combination count is reduced from 2^3=8 to 3, which is a
>great optimization. The same goes for the YEAR,QUATER,MONTH,DATE case.
>
>If we Donate the hierarchy dimension as H1,H2,H3, typical scenarios would
>be:
>
>*A. Hierarchies on lookup table*
>
>Fact  Table                            (joins)         Lookup Table
>===================                         =============
>column1,column2,,,,,, FK                         PK,,H1,H2,H3,,,,
>
>B. Hierarchies on fact table
>
>Fact  Table
>===========================
>column1,column2,,,H1,H2,H3,,,,,,,
>
>There is a special case for scenario A, where PK on the lookup table is
>accidentally being part of the hierarchies. For example we have a calendar
>lookup table where cal_dt is the primary key:
>
>*A*. Hierarchies on lookup table over its primary key*
>
>Lookup Table(Calendar)
>==============================================
>cal_dt(PK), week_beg_dt, month_beg_dt, quarter_beg_dt,,,
>
>For cases like A* what you need is another optimization called "Derived
>Columns"
>
>*Derived Columns:*
>
>Derived column is used when one or more dimensions (They must be dimension
>on lookup table, these columns are called "Derived") can be deduced from
>another(Usually it is the corresponding FK, this is called the "host
>column")
>
>For example, suppose we have a lookup table where we join fact table and
>it
>with "where DimA = DimX". Notice in Kylin, if you choose FK into a
>dimension, the corresponding PK will be automatically querable, without
>any
>extra cost. The secret is that since FK and PK are always identical, Kylin
>can apply filters/groupby on the FK first, and transparently replace them
>to PK.  This indicates that if we want the DimA(FK), DimX(PK), DimB, DimC
>in our cube, we can safely choose DimA,DimB,DimC only.
>
>
>Fact  Table                                     (joins)           Lookup
>Table
>========================                         =============
>column1,column2,,,,,, DimA(FK)                         DimX(PK),,DimB,
>DimC
>
>Let's say that DimA(the dimension representing FK/PK) has a special
>mapping
>to DimB:
>
>dimA    dimB  dimC
>1           a        ?
>2           b        ?
>3           c        ?
>4           a        ?
>
>in this case, given a value in DimA, the value of DimB is determined, so
>we
>say dimB can be derived from DimA. When we build a cube that contains both
>DimA and DimB, we simple include DimA, and marking DimB as derived.
>Derived
>column(DimB) does not participant in cuboids generation:
>
>original combinations:
>ABC,AB,AC,BC,A,B,C
>
>combinations when driving B from A:
>AC,A,C
>
>at Runtime, in case queries like "select count(*) from fact_table inner
>join looup1 group by looup1 .dimB", it is expecting cuboid containing DimB
>to answer the query. However, DimB will appear in NONE of the cuboids due
>to derived optimization. In this case, we modify the execution plan to
>make
>it group by  DimA(its host column) first, we'll get intermediate answer
>like:
>
>DimA  count(*)
>1          1
>2          1
>3          1
>4          1
>
>Afterwards, Kylin will replace DimA values with DimB values(since both of
>their values are in lookup table, Kylin can load the whole lookup table
>into memory and build a mapping for them), and the intermediate result
>becomes:
>
>DimB  count(*)
>a          1
>b          1
>c          1
>a          1
>
>After this, the runtime SQL engine(calcite) will further aggregate the
>intermediate result to:
>
>DimB  count(*)
>a          2
>b          1
>c          1
>
>this step happens at query runtime, this is what it means "at the cost of
>extra runtime aggregation"
>
>
>
>
>On Sat, Aug 1, 2015 at 12:54 AM, alex schufo <[email protected]> wrote:
>
>> Sorry to be a bit annoying with the topic but I tried different cubes /
>> hierarchies and can never join.
>>
>> Without this basically I cannot use Kylin on PROD for my project.
>>
>> The stack trace:
>>
>> http-bio-7070-exec-3]:[2015-07-31
>>
>> 
>>09:42:06,337][ERROR][org.apache.kylin.rest.controller.BasicController.han
>>dleError(BasicController.java:52)]
>> -
>>
>> org.apache.kylin.rest.exception.InternalErrorException: Can't find any
>> realization. Please confirm with providers. SQL digest: fact table
>> DEFAULT.SAMPLE_DIM,group by [DEFAULT.SAMPLE_DIM.ID],filter on [],with
>> aggregates[].
>>
>> while executing SQL: "select id from sample_dim group by id LIMIT 50000"
>>
>>         at
>>
>> 
>>org.apache.kylin.rest.controller.QueryController.doQueryInternal(QueryCon
>>troller.java:223)
>>
>>         at
>>
>> 
>>org.apache.kylin.rest.controller.QueryController.doQuery(QueryController.
>>java:174)
>>
>>         at
>>
>> 
>>org.apache.kylin.rest.controller.QueryController.query(QueryController.ja
>>va:91)
>>
>>         at
>>
>> 
>>org.apache.kylin.rest.controller.QueryController$$FastClassByCGLIB$$fc039
>>d0b.invoke(<generated>)
>>
>>         at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
>>
>>         at
>>
>> 
>>org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.in
>>vokeJoinpoint(Cglib2AopProxy.java:689)
>>
>>         at
>>
>> 
>>org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(Refl
>>ectiveMethodInvocation.java:150)
>>
>>         at
>>
>> 
>>com.ryantenney.metrics.spring.TimedMethodInterceptor.invoke(TimedMethodIn
>>terceptor.java:48)
>>
>>         at
>>
>> 
>>com.ryantenney.metrics.spring.TimedMethodInterceptor.invoke(TimedMethodIn
>>terceptor.java:34)
>>
>>         at
>>
>> 
>>com.ryantenney.metrics.spring.AbstractMetricMethodInterceptor.invoke(Abst
>>ractMetricMethodInterceptor.java:59)
>>
>>         at
>>
>> 
>>org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(Refl
>>ectiveMethodInvocation.java:172)
>>
>>         at
>>
>> 
>>org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedIntercepto
>>r.intercept(Cglib2AopProxy.java:622)
>>
>>         at
>>
>> 
>>org.apache.kylin.rest.controller.QueryController$$EnhancerByCGLIB$$5b6079
>>24.query(<generated>)
>>
>>         at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown Source)
>>
>>         at
>>
>> 
>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI
>>mpl.java:43)
>>
>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>
>>         at
>>
>> 
>>org.springframework.web.method.support.InvocableHandlerMethod.invoke(Invo
>>cableHandlerMethod.java:213)
>>
>>         at
>>
>> 
>>org.springframework.web.method.support.InvocableHandlerMethod.invokeForRe
>>quest(InvocableHandlerMethod.java:126)
>>
>>         at
>>
>> 
>>org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHan
>>dlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96)
>>
>>         at
>>
>> 
>>org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandl
>>erAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617)
>>
>>         at
>>
>> 
>>org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandl
>>erAdapter.handleInternal(RequestMappingHandlerAdapter.java:578)
>>
>>         at
>>
>> 
>>org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.h
>>andle(AbstractHandlerMethodAdapter.java:80)
>>
>>         at
>>
>> 
>>org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherSe
>>rvlet.java:923)
>>
>>         at
>>
>> 
>>org.springframework.web.servlet.DispatcherServlet.doService(DispatcherSer
>>vlet.java:852)
>>
>>         at
>>
>> 
>>org.springframework.web.servlet.FrameworkServlet.processRequest(Framework
>>Servlet.java:882)
>>
>>         at
>>
>> 
>>org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.
>>java:789)
>>
>>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
>>
>>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
>>
>>         at
>>
>> 
>>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
>>ionFilterChain.java:303)
>>
>>         at
>>
>> 
>>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
>>rChain.java:208)
>>
>>         at
>> org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
>>
>>         at
>>
>> 
>>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
>>ionFilterChain.java:241)
>>
>>         at
>>
>> 
>>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
>>rChain.java:208)
>>
>>         at
>>
>> 
>>com.codahale.metrics.servlet.AbstractInstrumentedFilter.doFilter(Abstract
>>InstrumentedFilter.java:97)
>>
>>         at
>>
>> 
>>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
>>ionFilterChain.java:241)
>>
>>         at
>>
>> 
>>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
>>rChain.java:208)
>>
>>         at
>>
>> 
>>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
>>lter(FilterChainProxy.java:330)
>>
>>         at
>>
>> 
>>org.springframework.security.web.access.intercept.FilterSecurityIntercept
>>or.invoke(FilterSecurityInterceptor.java:118)
>>
>>         at
>>
>> 
>>org.springframework.security.web.access.intercept.FilterSecurityIntercept
>>or.doFilter(FilterSecurityInterceptor.java:84)
>>
>>         at
>>
>> 
>>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
>>lter(FilterChainProxy.java:342)
>>
>>         at
>>
>> 
>>org.springframework.security.web.access.ExceptionTranslationFilter.doFilt
>>er(ExceptionTranslationFilter.java:113)
>>
>>         at
>>
>> 
>>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
>>lter(FilterChainProxy.java:342)
>>
>>         at
>>
>> 
>>org.springframework.security.web.session.SessionManagementFilter.doFilter
>>(SessionManagementFilter.java:103)
>>
>>         at
>>
>> 
>>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
>>lter(FilterChainProxy.java:342)
>>
>>        at
>>
>> 
>>org.springframework.security.web.authentication.AnonymousAuthenticationFi
>>lter.doFilter(AnonymousAuthenticationFilter.java:113)
>>
>>         at
>>
>> 
>>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
>>lter(FilterChainProxy.java:342)
>>
>>         at
>>
>> 
>>org.springframework.security.web.servletapi.SecurityContextHolderAwareReq
>>uestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54)
>>
>>         at
>>
>> 
>>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
>>lter(FilterChainProxy.java:342)
>>
>>         at
>>
>> 
>>org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doF
>>ilter(RequestCacheAwareFilter.java:45)
>>
>>         at
>>
>> 
>>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
>>lter(FilterChainProxy.java:342)
>>
>>         at
>>
>> 
>>org.springframework.security.web.authentication.www.BasicAuthenticationFi
>>lter.doFilter(BasicAuthenticationFilter.java:150)
>>
>>         at
>>
>> 
>>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
>>lter(FilterChainProxy.java:342)
>>
>>         at
>>
>> 
>>org.springframework.security.web.authentication.ui.DefaultLoginPageGenera
>>tingFilter.doFilter(DefaultLoginPageGeneratingFilter.java:91)
>>
>>         at
>>
>> 
>>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
>>lter(FilterChainProxy.java:342)
>>
>>         at
>>
>> 
>>org.springframework.security.web.authentication.AbstractAuthenticationPro
>>cessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:183)
>>
>>         at
>>
>> 
>>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
>>lter(FilterChainProxy.java:342)
>>
>>         at
>>
>> 
>>org.springframework.security.web.authentication.logout.LogoutFilter.doFil
>>ter(LogoutFilter.java:105)
>>
>>         at
>>
>> 
>>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
>>lter(FilterChainProxy.java:342)
>>
>>         at
>>
>> 
>>org.springframework.security.web.context.SecurityContextPersistenceFilter
>>.doFilter(SecurityContextPersistenceFilter.java:87)
>>
>>         at
>>
>> 
>>org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFi
>>lter(FilterChainProxy.java:342)
>>
>>         at
>>
>> 
>>org.springframework.security.web.FilterChainProxy.doFilterInternal(Filter
>>ChainProxy.java:192)
>>
>>         at
>>
>> 
>>org.springframework.security.web.FilterChainProxy.doFilter(FilterChainPro
>>xy.java:160)
>>
>>         at
>>
>> 
>>org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(Deleg
>>atingFilterProxy.java:346)
>>
>>         at
>>
>> 
>>org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingF
>>ilterProxy.java:259)
>>
>>         at
>>
>> 
>>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
>>ionFilterChain.java:241)
>>
>>         at
>>
>> 
>>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
>>rChain.java:208)
>>
>>         at
>>
>> 
>>org.apache.kylin.rest.filter.KylinApiFilter.doFilterInternal(KylinApiFilt
>>er.java:64)
>>
>>         at
>>
>> 
>>org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerReque
>>stFilter.java:76)
>>
>>         at
>>
>> 
>>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
>>ionFilterChain.java:241)
>>
>>         at
>>
>> 
>>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
>>rChain.java:208)
>>
>>         at
>> com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:195)
>>
>>         at
>> com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:266)
>>
>>         at
>>
>> 
>>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
>>ionFilterChain.java:241)
>>
>>         at
>>
>> 
>>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
>>rChain.java:208)
>>
>>         at
>>
>> 
>>org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve
>>.java:220)
>>
>>         at
>>
>> 
>>org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve
>>.java:122)
>>
>>         at
>>
>> 
>>org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorB
>>ase.java:504)
>>
>>         at
>>
>> 
>>org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:
>>170)
>>
>>         at
>>
>> 
>>org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:
>>103)
>>
>>         at
>> 
>>org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
>>
>>         at
>>
>> 
>>org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.j
>>ava:116)
>>
>>         at
>> 
>>org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:42
>>1)
>>
>>         at
>>
>> 
>>org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Pr
>>ocessor.java:1074)
>>
>>         at
>>
>> 
>>org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(Abst
>>ractProtocol.java:611)
>>
>>         at
>>
>> 
>>org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.ja
>>va:316)
>>
>>         at
>>
>> 
>>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java
>>:1145)
>>
>>         at
>>
>> 
>>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav
>>a:615)
>>
>>         at
>>
>> 
>>org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread
>>.java:61)
>>
>>         at java.lang.Thread.run(Thread.java:744)
>>
>>
>> The result for "select * from sample_dim"
>>
>> ID,DIM1,DIM2
>>
>> 33814,NYC,USA
>>
>> 201431,PARIS,FRANCE
>>
>> etc.
>>
>>
>>
>> On Wed, Jul 29, 2015 at 3:37 PM, alex schufo <[email protected]>
>>wrote:
>>
>> > So with 0.7.2 the cube builds, and I can see some improvement:
>> >
>> > "select * from SAMPLE_DIM" now returns all the fields, i.e:
>> >
>> > dim1, dim2, dim3, etc., SAMPLE_ID
>> >
>> > and I can see all the values for each field.
>> >
>> > However the join between the fact table and the lookup table still
>>does
>> > not work, it returns:
>> >
>> >     Can't find any realization.
>> >
>> > And if I do "select SAMPLE_ID from SAMPLE_DIM group by SAMPLE_ID" it
>>also
>> > returns:
>> >
>> >     Can't find any realization.
>> >
>> > If I do "select SAMPLE_ID from FACT_TABLE group by SAMPLE_ID" then I
>>get
>> > the list of all SAMPLE_ID as expected.
>> >
>> > If I do "select dim1 from SAMPLE_DIM group by dim1" I also get the
>>list
>> of
>> > all dim1 as expected.
>> >
>> > The same exact query works perfectly on Hive (although it takes a long
>> > time to be processed of course).
>> >
>> > Am I doing something wrong?
>> >
>> > On Wed, Jul 29, 2015 at 1:35 PM, alex schufo <[email protected]>
>> wrote:
>> >
>> >> Ok I guess this is https://issues.apache.org/jira/browse/KYLIN-831,
>> >> right?
>> >>
>> >> I upgraded today to 0.7.2 and hope it solves the problem then.
>> >>
>> >> Regards
>> >>
>> >> On Tue, Jul 28, 2015 at 5:52 PM, alex schufo <[email protected]>
>> >> wrote:
>> >>
>> >>> I still don't understand this.
>> >>>
>> >>> I have a simple fact table and a simple SAMPLE_DIM lookup table.
>>They
>> >>> are joined on SAMPLE_ID.
>> >>>
>> >>> If I do like you say and include all the columns of SAMPLE_DIM as a
>> >>> hierarchy and do not include the SAMPLE_ID then the cube builds
>> >>> successfully but I cannot query with the hierarchy. Any join
>>results in
>> >>> this error:
>> >>>
>> >>> Column 'SAMPLE_ID' not found in table 'SAMPLE_DIM'
>> >>>
>> >>> Indeed if I do a select * from 'SAMPLE_DIM' I can see all the
>>hierarchy
>> >>> but not the SAMPLE_ID used to join with the fact table.
>> >>>
>> >>> If I include the SAMPLE_ID in the hierarchy definition then the cube
>> >>> build fails on step 3 with:
>> >>>
>> >>> java.lang.NullPointerException: Column DEFAULT.FACT_TABLE.SAMPLE_ID
>> does
>> >>> not exist in row key desc
>> >>> at
>> org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158)
>> >>> at
>> >>>
>> 
>>org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152)
>> >>> at
>> >>>
>> 
>>org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:16
>>3)
>> >>> at
>> >>>
>> 
>>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionar
>>yGeneratorCLI.java:51)
>> >>> at
>> >>>
>> 
>>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionar
>>yGeneratorCLI.java:42)
>> >>> at
>> >>>
>> 
>>org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionary
>>Job.java:53)
>> >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>> >>> at
>> >>>
>> 
>>org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecu
>>table.java:63)
>> >>> at
>> >>>
>> 
>>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecuta
>>ble.java:107)
>> >>> at
>> >>>
>> 
>>org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultCha
>>inedExecutable.java:50)
>> >>> at
>> >>>
>> 
>>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecuta
>>ble.java:107)
>> >>> at
>> >>>
>> 
>>org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Defau
>>ltScheduler.java:132)
>> >>> at
>> >>>
>> 
>>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java
>>:1145)
>> >>> at
>> >>>
>> 
>>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav
>>a:615)
>> >>> at java.lang.Thread.run(Thread.java:744)
>> >>>
>> >>> (the SAMPLE_ID *does* exist in the FACT_TABLE)
>> >>>
>> >>> The only scenario I could make it work is when I also create a
>>derived
>> >>> dimension SAMPLE_ID / something else, then somehow the SAMPLE_ID is
>> >>> included and can be queried.
>> >>>
>> >>> Any help with that?
>> >>>
>> >>>
>> >>> On Fri, Jun 19, 2015 at 1:37 PM, alex schufo <[email protected]>
>> >>> wrote:
>> >>>
>> >>>> Thanks for the answer,
>> >>>>
>> >>>> Indeed I had a look at these slides before and it's great to
>> understand
>> >>>> the high level concepts but I ended up spending quite some time
>>when
>> >>>> designing my dimensions with the issues mentioned below.
>> >>>>
>> >>>> On Fri, Jun 19, 2015 at 11:23 AM, jason zhong
>><[email protected]
>> >
>> >>>> wrote:
>> >>>>
>> >>>>> Hi Alex,
>> >>>>>
>> >>>>> We have a slide to hlep you understand how to build cube.I don't
>>know
>> >>>>> whether you have read this? This will hlep you understand derived
>>and
>> >>>>> hierarchy.
>> >>>>>
>> >>>>> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
>> >>>>>
>> >>>>> for your case about hierarchy,log_date should not be included in
>> >>>>> hierarchy
>> >>>>> ,here's a bug you help find it.we will follow this.
>> >>>>>
>> >>>>> also .more document and UI enhancement will be done to help user
>> build
>> >>>>> cube
>> >>>>> easily.
>> >>>>>
>> >>>>> Thanks!!
>> >>>>>
>> >>>>> On Fri, Jun 12, 2015 at 5:07 PM, alex schufo
>><[email protected]>
>> >>>>> wrote:
>> >>>>>
>> >>>>> > I am trying to create a simple cube with a fact table and 3
>> >>>>> dimensions.
>> >>>>> >
>> >>>>> > I have read the different slideshares and wiki pages, but I
>>found
>> >>>>> that the
>> >>>>> > documentation is not very specific on how to manage hierarchies.
>> >>>>> >
>> >>>>> > Let's take this simple example :
>> >>>>> >
>> >>>>> > Fact table: productID, storeID, logDate, numbOfSell, etc.
>> >>>>> >
>> >>>>> > Date lookup table : logDate, week, month, quarter, etc.
>> >>>>> >
>> >>>>> > I specified Left join on logDate, actually when I specify this I
>> >>>>> find it
>> >>>>> > not very clear which one is considered to be the Left table and
>> >>>>> which one
>> >>>>> > is considered to be the Right table. I assumed the Fact table
>>was
>> >>>>> the left
>> >>>>> > table and the Lookup table the right table, looking at it now I
>> >>>>> think that
>> >>>>> > might be a mistake (I am just interested in dates for which
>>there
>> are
>> >>>>> > results in the fact table).
>> >>>>> >
>> >>>>> > If I use the auto generator it creates a derived dimension, I
>>don't
>> >>>>> think
>> >>>>> > that's what I need.
>> >>>>> >
>> >>>>> > So I created a hierarchy, but again to me it's clearly indicated
>> if I
>> >>>>> > should create ["quarter", "month", "week", "log_date"] or
>> ["logDate",
>> >>>>> > "week", "month", "quarter"]?
>> >>>>> >
>> >>>>> > Also should I include log_date in the hierarchy? To me it was
>>more
>> >>>>> > intuitive not to include it because it's already the join, but
>>it
>> >>>>> created
>> >>>>> > the cube without it and I cannot query by date, it says that
>> >>>>> "log_date" is
>> >>>>> > not found in the date table (it is in the Hive table but not the
>> cube
>> >>>>> > built). If I include it in the hierarchy the cube build fails
>>with
>> >>>>> this
>> >>>>> > error :
>> >>>>> >
>> >>>>> > java.lang.NullPointerException: Column
>>DEFAULT.DATE_TABLE.LOG_DATE
>> >>>>> > does not exist in row key desc
>> >>>>> >         at
>> >>>>> >
>> >>>>>
>> org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158)
>> >>>>> >         at
>> >>>>> >
>> >>>>>
>> 
>>org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152)
>> >>>>> >         at
>> >>>>> >
>> >>>>>
>> 
>>org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:16
>>3)
>> >>>>> >         at
>> >>>>> >
>> >>>>>
>> 
>>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionar
>>yGeneratorCLI.java:51)
>> >>>>> >         at
>> >>>>> >
>> >>>>>
>> 
>>org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionar
>>yGeneratorCLI.java:42)
>> >>>>> >         at
>> >>>>> >
>> >>>>>
>> 
>>org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionary
>>Job.java:53)
>> >>>>> >         at
>> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> >>>>> >         at
>> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>> >>>>> >         at
>> >>>>> >
>> >>>>>
>> 
>>org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecu
>>table.java:63)
>> >>>>> >         at
>> >>>>> >
>> >>>>>
>> 
>>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecuta
>>ble.java:107)
>> >>>>> >         at
>> >>>>> >
>> >>>>>
>> 
>>org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultCha
>>inedExecutable.java:50)
>> >>>>> >         at
>> >>>>> >
>> >>>>>
>> 
>>org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecuta
>>ble.java:107)
>> >>>>> >         at
>> >>>>> >
>> >>>>>
>> 
>>org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Defau
>>ltScheduler.java:132)
>> >>>>> >         at
>> >>>>> >
>> >>>>>
>> 
>>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java
>>:1145)
>> >>>>> >         at
>> >>>>> >
>> >>>>>
>> 
>>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav
>>a:615)
>> >>>>> >         at java.lang.Thread.run(Thread.java:744)
>> >>>>> >
>> >>>>> > result code:2
>> >>>>> >
>> >>>>> >
>> >>>>> > I think it might be useful to improve the documentation to
>>explain
>> >>>>> this
>> >>>>> > more clearly and not just the basic steps because building a
>>cube
>> >>>>> even on
>> >>>>> > short time ranges takes some time so learning by trial / error
>>is
>> >>>>> very time
>> >>>>> > consuming.
>> >>>>> >
>> >>>>> > Same thing for the derived dimensions, should I include
>>["storeID",
>> >>>>> > "storeName"] or just ["storeName"]? The second option seems to
>>work
>> >>>>> for me.
>> >>>>> >
>> >>>>> > Thanks
>> >>>>> >
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>
>
>
>-- 
>Regards,
>
>*Bin Mahone | 马洪宾*
>Apache Kylin: http://kylin.io
>Github: https://github.com/binmahone

Reply via email to