Kylin doesn't treat HCatalog as a third-party jar; It assumes the hive libraries is a part of hadoop cluster, just like common hadoop libs, and the nodes in cluster are identical; If you couldn't install it in your hadoop cluster, a possible way is to embed HCatalog classes in Kylin's job jar; The job jar will be submitted to all working nodes as a third-party lib; We didn't verify this but you can have a try:
1. Checkout Kylin code repository from https://github.com/apache/incubator-kylin.git, use the master branch; 2. Find the dependency clarification of hcatalog in kylin-job module, remove "<scope>provided</scope>" to use default scope: https://github.com/apache/incubator-kylin/blob/master/job/pom.xml#L210 2. Run "mvn package -DskipTests" under the Kylin project folder, to re-package the jars; 3. Check the new job/target/kylin-job-0.7.1-incubating-job.jar, it should include HCatalog classes; 4. Copy and rename this jar to your Kylin installation in $KYLIN_HOME/lib/, to overwrite the old one (backup old jar to other folder); 5. Restart Kylin and then resume the fail job, to see whether the ClassNotFound error still there; If it works, please let us know; 2015-06-15 22:33 GMT+08:00 alex schufo <[email protected]>: > I suspect that the HCatalog jar is not on the Hadoop nodes, or in a > different location, but I am not the Hadoop administrator so I am not > allowed to modify that. > > I was reading this article: > > http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/ > and my understanding was that by specifying the third party jar when > launching the MR job it would be made available to the Hadoop nodes. I > thought that the RunJar command in bin/kylin.sh was doing something > similar. > > Also this article mentions that installing the jars on the cluster nodes > is deprecated. > > On Mon, Jun 15, 2015 at 2:58 PM, ShaoFeng Shi <[email protected]> > wrote: > > > is Hive/hcatalog installed on all hadoop nodes, with the same location? > > > > 2015-06-15 19:10 GMT+08:00 alex schufo <[email protected]>: > > > > > Hello, I installed Kylin on a new Hadoop cluster. > > > > > > On the Kylin instance HCatalog is found at > > > > > > > > > /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core-0.13.0.2.1.7.0-784.jar > > > and I don't get any error while running bin/find-hive-dependency.sh > (see > > > full output below). > > > > > > However when I build a cube the Extract Fact Table Distinct Columns > step > > > fails because the MR cannot find the HCat dependency. There is no > > exception > > > in tomcat/logs/kylin.log > > > > > > Just this : > > > > > > [pool-7-thread-3]:[2015-06-15 > > > > > > > > > 03:10:05,501][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)] > > > - *State of Hadoop job: job_1430752988188_1332267:RUNNING-UNDEFINED* > > > > > > [pool-7-thread-3]:[2015-06-15 > > > > > > > > > 03:10:05,505][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:171)] > > > - Saving resource > /execute_output/0c56071b-4460-4e87-9f8b-8ea1d525d3ec-01 > > > (Store kylin_metadata@hbase) > > > > > > [pool-7-thread-3]:[2015-06-15 > > > > > > > > > 03:10:15,515][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)] > > > - Going to buffer response body of large or unknown size. Using > > > getResponseBodyAsStream instead is recommended. > > > > > > [pool-7-thread-3]:[2015-06-15 > > > > > > > > > 03:10:15,516][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:90)] > > > - Job job_1430752988188_1332267 get status check result. > > > > > > > > > [pool-7-thread-3]:[2015-06-15 > > > > > > > > > 03:10:15,516][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)] > > > - *State of Hadoop job: job_1430752988188_1332267:FINISHED-FAILED* > > > > > > [pool-7-thread-3]:[2015-06-15 > > > > > > > > > 03:10:15,520][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:171)] > > > - Saving resource > /execute_output/0c56071b-4460-4e87-9f8b-8ea1d525d3ec-01 > > > (Store kylin_metadata@hbase) > > > > > > [pool-7-thread-3]:[2015-06-15 > > > > > > > > > 03:10:15,704][WARN][org.apache.kylin.job.common.HadoopCmdOutput.updateJobCounter(HadoopCmdOutput.java:89)] > > > - no counters for job job_1430752988188_1332267 > > > > > > [pool-7-thread-3]:[2015-06-15 > > > > > > > > > 03:10:15,708][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:171)] > > > - Saving resource > /execute_output/0c56071b-4460-4e87-9f8b-8ea1d525d3ec-01 > > > (Store kylin_metadata@hbase) > > > > > > [pool-7-thread-3]:[2015-06-15 > > > > > > > > > 03:10:15,715][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:171)] > > > - Saving resource > /execute_output/0c56071b-4460-4e87-9f8b-8ea1d525d3ec-01 > > > (Store kylin_metadata@hbase) > > > > > > [pool-7-thread-3]:[2015-06-15 > > > > > > > > > 03:10:15,733][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:171)] > > > - Saving resource > /execute_output/0c56071b-4460-4e87-9f8b-8ea1d525d3ec-01 > > > (Store kylin_metadata@hbase) > > > > > > [pool-7-thread-3]:[2015-06-15 > > > > > > > > > 03:10:15,736][INFO][org.apache.kylin.job.manager.ExecutableManager.updateJobOutput(ExecutableManager.java:222)] > > > - *job id:0c56071b-4460-4e87-9f8b-8ea1d525d3ec-01 from RUNNING to > ERROR* > > > On the Hadoop node we can see that the MR job fails because HCatalog > was > > > not found: > > > > > > Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: > > Class > > > org.apache.hive.hcatalog.mapreduce.HCatInputFormat not found at > > > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1961) > at > > > > > > > > > org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:174) > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:726) at > > > org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at > > > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at > > > java.security.AccessController.doPrivileged(Native Method) at > > > javax.security.auth.Subject.doAs(Subject.java:415) at > > > > > > > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) > > > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused > by: > > > java.lang.ClassNotFoundException: Class > > > org.apache.hive.hcatalog.mapreduce.HCatInputFormat not found at > > > > > > > > > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1867) > > > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1959) > > > ... 8 more > > > > > > $ bin/find-hive-dependency.sh > > > > > > > > > Logging initialized using configuration in > > > file:/etc/hive/conf.dist/hive-log4j.properties > > > > > > SLF4J: Class path contains multiple SLF4J bindings. > > > > > > SLF4J: Found binding in > > > > > > > > > [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > > > > > > SLF4J: Found binding in > > > > > > > > > [jar:file:/opt/edw/hive/auxlib/hive-udfs.jar!/org/slf4j/impl/StaticLoggerBinder.class] > > > > > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > > > explanation. > > > > > > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > > > > > > hive dependency: > > > > > > > > > /etc/hive/conf:/usr/lib/hive/lib/hive-serde-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/commons-dbcp-1.4.jar:/usr/lib/hive/lib/asm-commons-3.1.jar:/usr/lib/hive/lib/jdo-api-3.0.1.jar:/usr/lib/hive/lib/derbyclient-10.10.1.1.jar:/usr/lib/hive/lib/antlr-runtime-3.4.jar:/usr/lib/hive/lib/geronimo-jaspic_1.0_spec-1.0.jar:/usr/lib/hive/lib/hive-service-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/hive-common-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/geronimo-jta_1.1_spec-1.1.1.jar:/usr/lib/hive/lib/hive-shims-common-secure-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/derbynet-10.10.1.1.jar:/usr/lib/hive/lib/httpcore-4.2.5.jar:/usr/lib/hive/lib/jpam-1.1.jar:/usr/lib/hive/lib/hive-exec-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/hive-metastore-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/commons-httpclient-3.0.1.jar:/usr/lib/hive/lib/velocity-1.5.jar:/usr/lib/hive/lib/guava-11.0.2.jar:/usr/lib/hive/lib/eigenbase-xom-1.3.4.jar:/usr/lib/hive/lib/commons-compiler-2.7.3.jar:/usr/lib/hive/lib/libfb303-0.9.0.jar:/usr/lib/hive/lib/commons-pool-1.5.4.jar:/usr/lib/hive/lib/libthrift-0.9.0.jar:/usr/lib/hive/lib/avro-1.7.5.jar:/usr/lib/hive/lib/commons-cli-1.2.jar:/usr/lib/hive/lib/hive-shims-common.jar:/usr/lib/hive/lib/stax-api-1.0.1.jar:/usr/lib/hive/lib/hive-shims-0.20-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/hive-cli-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/oro-2.0.8.jar:/usr/lib/hive/lib/eigenbase-properties-1.1.4.jar:/usr/lib/hive/lib/hive-ant.jar:/usr/lib/hive/lib/zookeeper-3.4.5.2.1.7.0-784.jar:/usr/lib/hive/lib/hive-hwi-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/commons-codec-1.4.jar:/usr/lib/hive/lib/mail-1.4.1.jar:/usr/lib/hive/lib/hive-shims-common-secure.jar:/usr/lib/hive/lib/servlet-api-2.5.jar:/usr/lib/hive/lib/optiq-core-0.5.jar:/usr/lib/hive/lib/ST4-4.0.4.jar:/usr/lib/hive/lib/datanucleus-api-jdo-3.2.6.jar:/usr/lib/hive/lib/hive-common.jar:/usr/lib/hive/lib/httpclient-4.2.5.jar:/usr/lib/hive/lib/hive-hbase-handler-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/hive-jdbc-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/hive-serde.jar:/usr/lib/hive/lib/derby-10.10.1.1.jar:/usr/lib/hive/lib/hive-hwi.jar:/usr/lib/hive/lib/optiq-avatica-0.5.jar:/usr/lib/hive/lib/hive-exec.jar:/usr/lib/hive/lib/hive-contrib.jar:/usr/lib/hive/lib/hive-contrib-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/hive-shims.jar:/usr/lib/hive/lib/junit-4.10.jar:/usr/lib/hive/lib/jta-1.1.jar:/usr/lib/hive/lib/hive-jdbc.jar:/usr/lib/hive/lib/hive-ant-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/hive-shims-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/hive-testutils-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/antlr-2.7.7.jar:/usr/lib/hive/lib/hive-shims-0.23-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/hive-testutils.jar:/usr/lib/hive/lib/xz-1.0.jar:/usr/lib/hive/lib/commons-collections-3.1.jar:/usr/lib/hive/lib/hive-metastore.jar:/usr/lib/hive/lib/commons-lang-2.4.jar:/usr/lib/hive/lib/paranamer-2.3.jar:/usr/lib/hive/lib/jetty-all-7.6.0.v20120127.jar:/usr/lib/hive/lib/commons-compress-1.4.1.jar:/usr/lib/hive/lib/asm-tree-3.1.jar:/usr/lib/hive/lib/hive-cli.jar:/usr/lib/hive/lib/hive-beeline-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/janino-2.7.3.jar:/usr/lib/hive/lib/hive-shims-0.20S-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/groovy-all-2.1.6.jar:/usr/lib/hive/lib/hive-service.jar:/usr/lib/hive/lib/hive-shims-common-0.13.0.2.1.7.0-784.jar:/usr/lib/hive/lib/datanucleus-rdbms-3.2.9.jar:/usr/lib/hive/lib/jline-0.9.94.jar:/usr/lib/hive/lib/datanucleus-core-3.2.10.jar:/usr/lib/hive/lib/ant-launcher-1.9.1.jar:/usr/lib/hive/lib/ant-1.9.1.jar:/usr/lib/hive/lib/hamcrest-core-1.1.jar:/usr/lib/hive/lib/snappy-java-1.0.5.jar:/usr/lib/hive/lib/stringtemplate-3.2.1.jar:/usr/lib/hive/lib/commons-io-2.4.jar:/usr/lib/hive/lib/hive-hbase-handler.jar:/usr/lib/hive/lib/servlet-api-2.5-20081211.jar:/usr/lib/hive/lib/tempus-fugit-1.1.jar:/usr/lib/hive/lib/linq4j-0.1.13.jar:/usr/lib/hive/lib/geronimo-annotation_1.0_spec-1.1.1.jar:/usr/lib/hive/lib/jetty-6.1.26.jar:/usr/lib/hive/lib/jetty-util-6.1.26.jar:/usr/lib/hive/lib/bonecp-0.8.0.RELEASE.jar:/usr/lib/hive/lib/hive-beeline.jar:/usr/lib/hive/lib/jsr305-1.3.9.jar:/usr/lib/hive/lib/activation-1.1.jar:/usr/lib/hive/lib/log4j-1.2.16.jar:/usr/lib/hive/lib/commons-logging-1.1.3.jar:/usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core-0.13.0.2.1.7.0-784.jar > > > > > >
