[
https://issues.apache.org/jira/browse/KYLIN-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997151#comment-16997151
]
rongneng.wei commented on KYLIN-4206:
-------------------------------------
This modification mainly solves the problem of aw glue catalog supported by
kylin, and the associated jira is
[https://issues.apache.org/jira/browse/KYLIN-4206](https://issues.apache.org/jira/browse/KYLIN-4206)。
1.First you need to modify the aws-glue-data-catalog-client source code.
aws-glue-data-catalog-client-for-apache-hive-metastore github address is
[https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore],aws-glue-client
development environment see README.MD.
I downloaded hive 2.3.7 locally, so after following the steps in the
[README.MD|http://readme.md/] file, the version of hive is 2.3.7-SNAPSHOT.
1)Modify the pom.xml file in the home directory.
<hive2.version>2.3.7-SNAPSHOT</hive2.version>
<spark-hive.version>1.2.1.spark2</spark-hive.version>
2)Modify the class
ofaws-glue-datacatalog-hive2-client/com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient
!uivjZlJlPOeuMSgo5fBDLgafqUuj4OS9IlxP6txY
pbQFAUFAEBAEBAFBQBAQBBJAoLCwELQPAwl4iosgcO8j8C88vGNb4sVz1wAAAABJRU5ErkJggg==!
Implementation method
@Override
public PartitionValuesResponse listPartitionValues (PartitionValuesRequest
partitionValuesRequest) throws MetaException, TException, NoSuchObjectException
{
return null;
}
!0oukQAJkEAnEIipg FOqC
LIAESIAESIAESIAESIAESIAESIAESIIFuTYBiTbfevWwcCZAACZAACZAACZAACZAACZAACZDA9UaAYs31tsdYXxIgARIgARIgARIgARIgARIgARIggW5NgGJNt969bBwJkAAJkAAJkAAJkAAJkAAJkAAJkMD1RoBizfW2x1hfEiABEiABEiABEiABEiABEiABEiCBbk2AYk233r1sHAmQAAmQAAmQAAmQAAmQAAmQAAmQwPVGgGLN9bbHWF8SIAESIAESIAESIAESIAESIAESIIFuTYBiTbfevWwcCZAACZAACZAACZAACZAACZAACZDA9UaAYs31tsdYXxIgARIgARIgARIgARIgARIgARIggW5N4H8ADheQXh
AqTUAAAAASUVORK5CYII=!
3)Modify the class
ofaws-glue-datacatalog-spark-client/com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.
The problems are as follows:
!w8byaQgJC3UKQAAAABJRU5ErkJggg==!
This method is not available in the parent class,so delete the method,Then copy
the method of aws-glue-datacatalog-hive2-client /
com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.Add dependency
in aws-glue-datacatalog-spark-client / pom.xml file
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive2.version}</version>
<scope>provided</scope>
</dependency>
4)Package,need to package three projects,as follows.
!9k=!
!2Q==!
!Z!
5).Copy the three package
aws-glue-datacatalog-client-common-1.10.0-SNAPSHOT.jar
aws-glue-datacatalog-hive2-client-1.10.0-SNAPSHOT.jar
aws-glue-datacatalog-spark-client-1.10.0-SNAPSHOT.jar
to /kylin/lib
2.*Modify the source code of kylin,See submission of PR.*
1)Add the gluecatalog in the config of kylin.properties.
##The default access HiveMetastoreClient is hcatalog. If AWS user and glue
catalog is used, it can be configured as gluecatalog
##kylin.source.hive.metadata-type=hcatalog
The default is hcatalog. If you want to use glue, please configure
kylin.source.hive.metadata-type = gluecatalog.
if config gluecatalog,so need to configure in hive-site.xml,as follows:
<property>
<name>hive.metastore.client.factory.class</name>
<value>com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory</value>
</property>
3.install on EMR
> Build kylin on EMR 5.23. The kylin version is 2.6.4. When building the cube,
> the hive table cannot be found
> -----------------------------------------------------------------------------------------------------------
>
> Key: KYLIN-4206
> URL: https://issues.apache.org/jira/browse/KYLIN-4206
> Project: Kylin
> Issue Type: Bug
> Components: Environment
> Affects Versions: v2.6.4
> Environment: EMR 5.23(hadoop 2.8.5\HBase 1.4.9\hive 2.3.4\Spark
> 2.4.0\Tez 0.9.1\HCatalog 2.3.4\Zookeeper 3.4.13)
> kylin 2.6.4
> Reporter: rongneng.wei
> Priority: Major
> Attachments: kylin.properties, kylin_hive_conf.xml, kylin_job_conf.xml
>
>
> hi,
> I Build kylin on EMR 5.23. The kylin version is 2.6.4.When building the
> cube, the hive table cannot be found.The detailed error information is as
> follows:
> java.lang.RuntimeException: java.io.IOException:
> NoSuchObjectException(message:kylin_flat_db_test1.kylin_intermediate_kylin_sales_cube_4e93b31d_3be2_c9e8_55de_a9814f63c4ba
> table not found)java.lang.RuntimeException: java.io.IOException:
> NoSuchObjectException(message:kylin_flat_db_test1.kylin_intermediate_kylin_sales_cube_4e93b31d_3be2_c9e8_55de_a9814f63c4ba
> table not found) at
> org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.configureJob(HiveMRInput.java:83)
> at
> org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.setupMapper(FactDistinctColumnsJob.java:126)
> at
> org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:104)
> at
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:131)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
> at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
> at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> On the EMR, hive metadata is shared by glue, and the URL of Metastore is
> configured in hive-site.xml.
> <name>hive.metastore.uris</name>
> <value>thrift://ip-172-40-15-164.ec2.internal:9083</value>
> <description>JDBC connect string for a JDBC metastore</description>
> </property>
> <property>
> <name>hive.metastore.client.factory.class</name>
>
> <value>com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory</value>
> </property>
> But when I use hive's own metadata, that is, don't use glue to share
> metadata, the above exception will not occur, comment out the following
> configuration.
> <!--<property>
> <name>hive.metastore.client.factory.class</name>
> <value>com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory</value>
> </property>
> -->
> But since EMR uses shared metadata, if you don't use metadata sharing, then I
> can't query other hive tables built by the cluster.
> The configuration file is detailed in the attachment. Please help me solve
> this problem.Thank you。
> Best regard.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)