Thanks for the answer, Indeed I had a look at these slides before and it's great to understand the high level concepts but I ended up spending quite some time when designing my dimensions with the issues mentioned below.
On Fri, Jun 19, 2015 at 11:23 AM, jason zhong <[email protected]> wrote: > Hi Alex, > > We have a slide to hlep you understand how to build cube.I don't know > whether you have read this? This will hlep you understand derived and > hierarchy. > > http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin > > for your case about hierarchy,log_date should not be included in hierarchy > ,here's a bug you help find it.we will follow this. > > also .more document and UI enhancement will be done to help user build cube > easily. > > Thanks!! > > On Fri, Jun 12, 2015 at 5:07 PM, alex schufo <[email protected]> wrote: > > > I am trying to create a simple cube with a fact table and 3 dimensions. > > > > I have read the different slideshares and wiki pages, but I found that > the > > documentation is not very specific on how to manage hierarchies. > > > > Let's take this simple example : > > > > Fact table: productID, storeID, logDate, numbOfSell, etc. > > > > Date lookup table : logDate, week, month, quarter, etc. > > > > I specified Left join on logDate, actually when I specify this I find it > > not very clear which one is considered to be the Left table and which one > > is considered to be the Right table. I assumed the Fact table was the > left > > table and the Lookup table the right table, looking at it now I think > that > > might be a mistake (I am just interested in dates for which there are > > results in the fact table). > > > > If I use the auto generator it creates a derived dimension, I don't think > > that's what I need. > > > > So I created a hierarchy, but again to me it's clearly indicated if I > > should create ["quarter", "month", "week", "log_date"] or ["logDate", > > "week", "month", "quarter"]? > > > > Also should I include log_date in the hierarchy? To me it was more > > intuitive not to include it because it's already the join, but it created > > the cube without it and I cannot query by date, it says that "log_date" > is > > not found in the date table (it is in the Hive table but not the cube > > built). If I include it in the hierarchy the cube build fails with this > > error : > > > > java.lang.NullPointerException: Column DEFAULT.DATE_TABLE.LOG_DATE > > does not exist in row key desc > > at > > org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158) > > at > > org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152) > > at > > > org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:163) > > at > > > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:51) > > at > > > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > > at > > > org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > > at > > > org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63) > > at > > > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) > > at > > > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > > at > > > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) > > at > > > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:744) > > > > result code:2 > > > > > > I think it might be useful to improve the documentation to explain this > > more clearly and not just the basic steps because building a cube even on > > short time ranges takes some time so learning by trial / error is very > time > > consuming. > > > > Same thing for the derived dimensions, should I include ["storeID", > > "storeName"] or just ["storeName"]? The second option seems to work for > me. > > > > Thanks > > >
