I still don't understand this. I have a simple fact table and a simple SAMPLE_DIM lookup table. They are joined on SAMPLE_ID.
If I do like you say and include all the columns of SAMPLE_DIM as a hierarchy and do not include the SAMPLE_ID then the cube builds successfully but I cannot query with the hierarchy. Any join results in this error: Column 'SAMPLE_ID' not found in table 'SAMPLE_DIM' Indeed if I do a select * from 'SAMPLE_DIM' I can see all the hierarchy but not the SAMPLE_ID used to join with the fact table. If I include the SAMPLE_ID in the hierarchy definition then the cube build fails on step 3 with: java.lang.NullPointerException: Column DEFAULT.FACT_TABLE.SAMPLE_ID does not exist in row key desc at org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158) at org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152) at org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:163) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:51) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) at org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) (the SAMPLE_ID *does* exist in the FACT_TABLE) The only scenario I could make it work is when I also create a derived dimension SAMPLE_ID / something else, then somehow the SAMPLE_ID is included and can be queried. Any help with that? On Fri, Jun 19, 2015 at 1:37 PM, alex schufo <[email protected]> wrote: > Thanks for the answer, > > Indeed I had a look at these slides before and it's great to understand > the high level concepts but I ended up spending quite some time when > designing my dimensions with the issues mentioned below. > > On Fri, Jun 19, 2015 at 11:23 AM, jason zhong <[email protected]> > wrote: > >> Hi Alex, >> >> We have a slide to hlep you understand how to build cube.I don't know >> whether you have read this? This will hlep you understand derived and >> hierarchy. >> >> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin >> >> for your case about hierarchy,log_date should not be included in hierarchy >> ,here's a bug you help find it.we will follow this. >> >> also .more document and UI enhancement will be done to help user build >> cube >> easily. >> >> Thanks!! >> >> On Fri, Jun 12, 2015 at 5:07 PM, alex schufo <[email protected]> >> wrote: >> >> > I am trying to create a simple cube with a fact table and 3 dimensions. >> > >> > I have read the different slideshares and wiki pages, but I found that >> the >> > documentation is not very specific on how to manage hierarchies. >> > >> > Let's take this simple example : >> > >> > Fact table: productID, storeID, logDate, numbOfSell, etc. >> > >> > Date lookup table : logDate, week, month, quarter, etc. >> > >> > I specified Left join on logDate, actually when I specify this I find it >> > not very clear which one is considered to be the Left table and which >> one >> > is considered to be the Right table. I assumed the Fact table was the >> left >> > table and the Lookup table the right table, looking at it now I think >> that >> > might be a mistake (I am just interested in dates for which there are >> > results in the fact table). >> > >> > If I use the auto generator it creates a derived dimension, I don't >> think >> > that's what I need. >> > >> > So I created a hierarchy, but again to me it's clearly indicated if I >> > should create ["quarter", "month", "week", "log_date"] or ["logDate", >> > "week", "month", "quarter"]? >> > >> > Also should I include log_date in the hierarchy? To me it was more >> > intuitive not to include it because it's already the join, but it >> created >> > the cube without it and I cannot query by date, it says that "log_date" >> is >> > not found in the date table (it is in the Hive table but not the cube >> > built). If I include it in the hierarchy the cube build fails with this >> > error : >> > >> > java.lang.NullPointerException: Column DEFAULT.DATE_TABLE.LOG_DATE >> > does not exist in row key desc >> > at >> > org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158) >> > at >> > >> org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152) >> > at >> > >> org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:163) >> > at >> > >> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:51) >> > at >> > >> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) >> > at >> > >> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53) >> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) >> > at >> > >> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63) >> > at >> > >> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) >> > at >> > >> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) >> > at >> > >> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) >> > at >> > >> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132) >> > at >> > >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> > at >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> > at java.lang.Thread.run(Thread.java:744) >> > >> > result code:2 >> > >> > >> > I think it might be useful to improve the documentation to explain this >> > more clearly and not just the basic steps because building a cube even >> on >> > short time ranges takes some time so learning by trial / error is very >> time >> > consuming. >> > >> > Same thing for the derived dimensions, should I include ["storeID", >> > "storeName"] or just ["storeName"]? The second option seems to work for >> me. >> > >> > Thanks >> > >> > >
