[ 
https://issues.apache.org/jira/browse/KYLIN-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997149#comment-16997149
 ] 

ASF GitHub Bot commented on KYLIN-4206:
---------------------------------------

Rongnengwei commented on pull request #995: KYLIN-4206  Add kylin supports aws 
glue catalog metastroeclient
URL: https://github.com/apache/kylin/pull/995
 
 
   
   This modification mainly solves the problem of aw glue catalog supported by 
kylin, and the associated jira is 
[https://issues.apache.org/jira/browse/KYLIN-4206](https://issues.apache.org/jira/browse/KYLIN-4206)。
   1.First you need to modify the aws-glue-data-catalog-client  source code.
   aws-glue-data-catalog-client-for-apache-hive-metastore github address is 
https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore,aws-glue-client
 development environment see README.MD.
   I downloaded hive 2.3.7 locally, so after following the steps in the 
README.MD file, the version of hive is 2.3.7-SNAPSHOT.
   1)Modify the pom.xml file in the home directory.
   <hive2.version>2.3.7-SNAPSHOT</hive2.version> 
   <spark-hive.version>1.2.1.spark2</spark-hive.version>
   2)Modify the  class of 
aws-glue-datacatalog-hive2-client/com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient
   
   Implementation method 
   ```
   @Override
     public PartitionValuesResponse listPartitionValues (PartitionValuesRequest 
partitionValuesRequest) throws MetaException, TException, NoSuchObjectException 
{
       return null;
     }
   ```
   
   3)Modify the  class of 
aws-glue-datacatalog-spark-client/com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.
   The problems are as follows:
   
   This method is not available in the parent class,so  delete the method,Then 
copy the method   of aws-glue-datacatalog-hive2-client / 
com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.Add dependency 
in aws-glue-datacatalog-spark-client / pom.xml file
   ```
   <dependency>
       <groupId>org.apache.hive</groupId>
       <artifactId>hive-exec</artifactId>
       <version>${hive2.version}</version>
       <scope>provided</scope>
   ```
   </dependency>
   4)Package,need to package three projects,as follows.
   aws-glue-datacatalog-client-common
   aws-glue-datacatalog-hive2-client
   aws-glue-datacatalog-spark-client
   5).Copy the three package
   aws-glue-datacatalog-client-common-1.10.0-SNAPSHOT.jar  
   aws-glue-datacatalog-hive2-client-1.10.0-SNAPSHOT.jar  
   aws-glue-datacatalog-spark-client-1.10.0-SNAPSHOT.jar
   to /kylin/lib
   2.Modify the source code of kylin,See submission of PR.
   1)Add the gluecatalog in the config  of  kylin.properties. 
   
   ```
   ##The default access HiveMetastoreClient is hcatalog. If AWS user and glue 
catalog is used, it can be configured as gluecatalog
   ##kylin.source.hive.metadata-type=hcatalog
   ```
   
   The default is hcatalog. If you want to use glue, please configure 
kylin.source.hive.metadata-type = gluecatalog.
   if config gluecatalog,so need to configure in hive-site.xml,as follows:
   
   ```
     <property>
       <name>hive.metastore.client.factory.class</name>    
<value>com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory</value>
     </property>
   ```
   
   3.install  on EMR 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Build kylin on EMR 5.23. The kylin version is 2.6.4. When building the cube, 
> the hive table cannot be found
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-4206
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4206
>             Project: Kylin
>          Issue Type: Bug
>          Components: Environment 
>    Affects Versions: v2.6.4
>         Environment: EMR 5.23(hadoop 2.8.5\HBase 1.4.9\hive 2.3.4\Spark 
> 2.4.0\Tez 0.9.1\HCatalog 2.3.4\Zookeeper 3.4.13)
> kylin 2.6.4
>            Reporter: rongneng.wei
>            Priority: Major
>         Attachments: kylin.properties, kylin_hive_conf.xml, kylin_job_conf.xml
>
>
> hi,
>    I  Build kylin on EMR 5.23. The kylin version is 2.6.4.When building the 
> cube, the hive table cannot be found.The detailed error information is as 
> follows:
> java.lang.RuntimeException: java.io.IOException: 
> NoSuchObjectException(message:kylin_flat_db_test1.kylin_intermediate_kylin_sales_cube_4e93b31d_3be2_c9e8_55de_a9814f63c4ba
>  table not found)java.lang.RuntimeException: java.io.IOException: 
> NoSuchObjectException(message:kylin_flat_db_test1.kylin_intermediate_kylin_sales_cube_4e93b31d_3be2_c9e8_55de_a9814f63c4ba
>  table not found) at 
> org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.configureJob(HiveMRInput.java:83)
>  at 
> org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.setupMapper(FactDistinctColumnsJob.java:126)
>  at 
> org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:104)
>  at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:131)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>  at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>  at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> On the EMR, hive metadata is shared by glue, and the URL of Metastore is 
> configured in hive-site.xml.
> <name>hive.metastore.uris</name>
>  <value>thrift://ip-172-40-15-164.ec2.internal:9083</value>
>  <description>JDBC connect string for a JDBC metastore</description>
>  </property>
> <property>
>  <name>hive.metastore.client.factory.class</name>
>  
> <value>com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory</value>
>  </property>
> But when I use hive's own metadata, that is, don't use glue to share 
> metadata, the above exception will not occur, comment out the following 
> configuration.
> <!--<property>
> <name>hive.metastore.client.factory.class</name>
> <value>com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory</value>
> </property>
> -->
> But since EMR uses shared metadata, if you don't use metadata sharing, then I 
> can't query other hive tables built by the cluster.
> The configuration file is detailed in the attachment. Please help me solve 
> this problem.Thank you。
> Best regard.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to