Re: Modeling hierarchies

alex schufo Wed, 29 Jul 2015 05:36:08 -0700

Ok I guess this is https://issues.apache.org/jira/browse/KYLIN-831, right?


I upgraded today to 0.7.2 and hope it solves the problem then.

Regards

On Tue, Jul 28, 2015 at 5:52 PM, alex schufo <[email protected]> wrote:

> I still don't understand this.
>
> I have a simple fact table and a simple SAMPLE_DIM lookup table. They are
> joined on SAMPLE_ID.
>
> If I do like you say and include all the columns of SAMPLE_DIM as a
> hierarchy and do not include the SAMPLE_ID then the cube builds
> successfully but I cannot query with the hierarchy. Any join results in
> this error:
>
> Column 'SAMPLE_ID' not found in table 'SAMPLE_DIM'
>
> Indeed if I do a select * from 'SAMPLE_DIM' I can see all the hierarchy
> but not the SAMPLE_ID used to join with the fact table.
>
> If I include the SAMPLE_ID in the hierarchy definition then the cube build
> fails on step 3 with:
>
> java.lang.NullPointerException: Column DEFAULT.FACT_TABLE.SAMPLE_ID does
> not exist in row key desc
> at org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158)
> at
> org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152)
> at
> org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:163)
> at
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:51)
> at
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
> at
> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at
> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
> (the SAMPLE_ID *does* exist in the FACT_TABLE)
>
> The only scenario I could make it work is when I also create a derived
> dimension SAMPLE_ID / something else, then somehow the SAMPLE_ID is
> included and can be queried.
>
> Any help with that?
>
>
> On Fri, Jun 19, 2015 at 1:37 PM, alex schufo <[email protected]> wrote:
>
>> Thanks for the answer,
>>
>> Indeed I had a look at these slides before and it's great to understand
>> the high level concepts but I ended up spending quite some time when
>> designing my dimensions with the issues mentioned below.
>>
>> On Fri, Jun 19, 2015 at 11:23 AM, jason zhong <[email protected]>
>> wrote:
>>
>>> Hi Alex,
>>>
>>> We have a slide to hlep you understand how to build cube.I don't know
>>> whether you have read this? This will hlep you understand derived and
>>> hierarchy.
>>>
>>> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
>>>
>>> for your case about hierarchy,log_date should not be included in
>>> hierarchy
>>> ,here's a bug you help find it.we will follow this.
>>>
>>> also .more document and UI enhancement will be done to help user build
>>> cube
>>> easily.
>>>
>>> Thanks!!
>>>
>>> On Fri, Jun 12, 2015 at 5:07 PM, alex schufo <[email protected]>
>>> wrote:
>>>
>>> > I am trying to create a simple cube with a fact table and 3 dimensions.
>>> >
>>> > I have read the different slideshares and wiki pages, but I found that
>>> the
>>> > documentation is not very specific on how to manage hierarchies.
>>> >
>>> > Let's take this simple example :
>>> >
>>> > Fact table: productID, storeID, logDate, numbOfSell, etc.
>>> >
>>> > Date lookup table : logDate, week, month, quarter, etc.
>>> >
>>> > I specified Left join on logDate, actually when I specify this I find
>>> it
>>> > not very clear which one is considered to be the Left table and which
>>> one
>>> > is considered to be the Right table. I assumed the Fact table was the
>>> left
>>> > table and the Lookup table the right table, looking at it now I think
>>> that
>>> > might be a mistake (I am just interested in dates for which there are
>>> > results in the fact table).
>>> >
>>> > If I use the auto generator it creates a derived dimension, I don't
>>> think
>>> > that's what I need.
>>> >
>>> > So I created a hierarchy, but again to me it's clearly indicated if I
>>> > should create ["quarter", "month", "week", "log_date"] or ["logDate",
>>> > "week", "month", "quarter"]?
>>> >
>>> > Also should I include log_date in the hierarchy? To me it was more
>>> > intuitive not to include it because it's already the join, but it
>>> created
>>> > the cube without it and I cannot query by date, it says that
>>> "log_date" is
>>> > not found in the date table (it is in the Hive table but not the cube
>>> > built). If I include it in the hierarchy the cube build fails with this
>>> > error :
>>> >
>>> > java.lang.NullPointerException: Column DEFAULT.DATE_TABLE.LOG_DATE
>>> > does not exist in row key desc
>>> >         at
>>> > org.apache.kylin.cube.model.RowKeyDesc.getColDesc(RowKeyDesc.java:158)
>>> >         at
>>> >
>>> org.apache.kylin.cube.model.RowKeyDesc.getDictionary(RowKeyDesc.java:152)
>>> >         at
>>> >
>>> org.apache.kylin.cube.model.RowKeyDesc.isUseDictionary(RowKeyDesc.java:163)
>>> >         at
>>> >
>>> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:51)
>>> >         at
>>> >
>>> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>>> >         at
>>> >
>>> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
>>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>>> >         at
>>> >
>>> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>>> >         at
>>> >
>>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
>>> >         at
>>> >
>>> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>>> >         at
>>> >
>>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
>>> >         at
>>> >
>>> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
>>> >         at
>>> >
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >         at
>>> >
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >         at java.lang.Thread.run(Thread.java:744)
>>> >
>>> > result code:2
>>> >
>>> >
>>> > I think it might be useful to improve the documentation to explain this
>>> > more clearly and not just the basic steps because building a cube even
>>> on
>>> > short time ranges takes some time so learning by trial / error is very
>>> time
>>> > consuming.
>>> >
>>> > Same thing for the derived dimensions, should I include ["storeID",
>>> > "storeName"] or just ["storeName"]? The second option seems to work
>>> for me.
>>> >
>>> > Thanks
>>> >
>>>
>>
>>
>

Re: Modeling hierarchies

Reply via email to