Srinivasan,

You can checkout 0.7-staging branch as start; Look into
org.apache.kylin.dict.lookup.HiveTable, the implementation of
"getSignature()" and "getColumnDelimeter()" is not perfect: it calls
"getFileTable()", which will check the underlying HDFS file, as we know
this is not suitable for external table;

To fix the problem, need re-write two methods; In the new "getSignature()",
using Hive API to get the table's path, size and last modified time, you
may need do some search here; For the new "getColumnDelimeter()", just
return DELIM_AUTO is okay;

After finish the code and pass all unit test, please create a patch and
attache it in the JIRA for review ("pull request" is not accepted anymore);

Thanks for the contribution;



2015-06-17 1:10 GMT+08:00 Srinivasan Hariharan <
[email protected]>:

> Hi ,
>
> I am interested to contribute to this JIRA, could anyone help me out where
> can I start.
>
> https://issues.apache.org/jira/browse/KYLIN-824
>
> Regards,
> Srinivasan Hariharan
>
>
>
> From: [email protected]
> To: [email protected]
> Subject: RE: Hive external Table Dimension
> Date: Thu, 11 Jun 2015 21:51:08 +0530
>
>
>
>
>
> Thanks,
>
> I have created JIRA.
>
> https://issues.apache.org/jira/browse/KYLIN-824
>
> I am interested to contribute, i will see the code and update for help.
>
>
> > From: [email protected]
> > To: [email protected]
> > Subject: Re: Hive external Table Dimension
> > Date: Thu, 11 Jun 2015 14:33:59 +0000
> >
> > Kylin need take snapshot for lookup tables for runtime queries (to derive
> > the dimensions that not on row key), that¹s why it try to seek the
> > underlying data file;
> >
> > So far without this it couldn¹t move ahead; For long run, Kylin can
> > consider to abstract this; Please open a JIRA as requirement if you like;
> >
> > On 6/11/15, 5:45 PM, "Srinivasan Hariharan02" <
> [email protected]>
> > wrote:
> >
> > >Hi,
> > >
> > >I have a dimension external  table in Hive which is created using Hbase
> > >Storage handler. After creating the cube using this hive  table cube
> > >build job failed  in the "Build Dimension Dictionary" with below error
> > >java.lang.IllegalStateException: Expect 1 and only 1 non-zero file under
> > >hdfs://host:8020/user/hive/warehouse/hbase.db/department/, but find 0
> > >        at
> > >org.apache.kylin.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:123)
> > >        at
> >
> >org.apache.kylin.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.java:
> > >107)
> > >        at
> >
> >org.apache.kylin.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:83)
> > >        at
> > >org.apache.kylin.dict.lookup.HiveTable.getFileTable(HiveTable.java:76)
> > >        at
> > >org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:71)
> > >        at
> >
> >org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.
> > >java:164)
> > >        at
> > >org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
> > >        at
> >
> >org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionary
> > >GeneratorCLI.java:53)
> > >        at
> >
> >org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(Dictionary
> > >GeneratorCLI.java:42)
> > >        at
> >
> >org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJ
> > >ob.java:53)
> > >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > >        at
> >
> >org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecut
> > >able.java:63)
> > >        at
> >
> >org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
> > >le.java:107)
> > >        at
> >
> >org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChai
> > >nedExecutable.java:50)
> > >        at
> >
> >org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutab
> > >le.java:107)
> > >        at
> >
> >org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Defaul
> > >tScheduler.java:132)
> > >        at
> >
> >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
> > >1145)
> > >        at
> >
> >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
> > >:615)
> > >        at java.lang.Thread.run(Thread.java:744)
> > >
> > >Since external table created from other sources like Hbase hive doesn't
> > >store any data in their warehouse directory. So it should not check for
> > >files under  warehouse dir for external tables. Please help.
> > >
> > >Regards,
> > >Srinivasan Hariharan
> > >Mob +91-9940395830
> > >
> > >
> > >**************** CAUTION - Disclaimer *****************
> > >This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> > >solely
> > >for the use of the addressee(s). If you are not the intended recipient,
> > >please
> > >notify the sender by e-mail and delete the original message. Further,
> you
> > >are not
> > >to copy, disclose, or distribute this e-mail or its contents to any
> other
> > >person and
> > >any such actions are unlawful. This e-mail may contain viruses. Infosys
> > >has taken
> > >every reasonable precaution to minimize this risk, but is not liable for
> > >any damage
> > >you may sustain as a result of any virus in this e-mail. You should
> carry
> > >out your
> > >own virus checks before opening the e-mail or attachment. Infosys
> > >reserves the
> > >right to monitor and review the content of all messages sent to or from
> > >this e-mail
> > >address. Messages sent to or from this e-mail address may be stored on
> > >the
> > >Infosys e-mail system.
> > >***INFOSYS******** End of Disclaimer ********INFOSYS***
> >
>
>
>

Reply via email to