linghengqian commented on issue #32223: URL: https://github.com/apache/shardingsphere/issues/32223#issuecomment-2243123751
# TL;DR - What I am going to say next may be a bit nonsense, but I do hope that friends with Hive knowledge can help deal with the problems encountered by the master branch on HiveServer2. No matter it is the master branch of ShardingSphere or the master branch of Hive. - We can leave ShardingSphere Proxy aside and discuss ShardingSphere JDBC first, both of which actually require the use of optional modules to parse the Hive dialect. First, compile the products of ShardingSpehre into the local repository. You need to remove `org.apache.shardingsphere.infra.database.hive.metadata.data.loader.HiveMetaDataLoader` as in https://github.com/apache/shardingsphere/pull/31526 to prevent the creation of an embedded HiveServer2, or try to raise a PR to remove this file marked TODO. Apparently this place involves reading a string like `thrift://<host_name>:<port>`. Feel free to check https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration ```shell sdk install java 22.0.1-graalce sdk use java 22.0.1-graalce git clone [email protected]:apache/shardingsphere.git cd ./shardingsphere/ git reset --hard 45b69b4d0d249f31b01ce963d82debb7de751da4 ./mvnw clean install -Prelease -T1C -DskipTests -Djacoco.skip=true -Dcheckstyle.skip=true -Drat.skip=true -Dmaven.javadoc.skip=true ``` - For ShardingSphere JDBC, you may need to introduce the following dependencies in the ShardingSphere configuration file to connect to HiveServer2. You can refer to https://github.com/apache/shardingsphere/pull/31526 . ```xml <project> <dependencies> <dependency> <groupId>org.apache.shardingsphere</groupId> <artifactId>shardingsphere-jdbc</artifactId> <version>5.5.1-SNAPSHOT</version> </dependency> <dependency> <groupId>org.apache.shardingsphere</groupId> <artifactId>shardingsphere-infra-database-hive</artifactId> <version>5.5.1-SNAPSHOT</version> </dependency> <dependency> <groupId>org.apache.shardingsphere</groupId> <artifactId>shardingsphere-parser-sql-hive</artifactId> <version>5.5.1-SNAPSHOT</version> </dependency> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>4.0.0</version> </dependency> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-service</artifactId> <version>4.0.0</version> <exclusions> <exclusion> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-slf4j-impl</artifactId> </exclusion> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-reload4j</artifactId> </exclusion> <exclusion> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-api</artifactId> </exclusion> <exclusion> <groupId>org.antlr</groupId> <artifactId>antlr4-runtime</artifactId> </exclusion> <exclusion> <groupId>org.codehaus.janino</groupId> <artifactId>commons-compiler</artifactId> </exclusion> <exclusion> <groupId>org.apache.commons</groupId> <artifactId>commons-dbcp2</artifactId> </exclusion> <exclusion> <groupId>commons-io</groupId> <artifactId>commons-io</artifactId> </exclusion> <exclusion> <groupId>commons-lang</groupId> <artifactId>commons-lang</artifactId> </exclusion> <exclusion> <groupId>org.apache.commons</groupId> <artifactId>commons-pool2</artifactId> </exclusion> <exclusion> <groupId>org.codehaus.janino</groupId> <artifactId>janino</artifactId> </exclusion> <exclusion> <groupId>com.fasterxml.woodstox</groupId> <artifactId>woodstox-core</artifactId> </exclusion> <exclusion> <groupId>org.bouncycastle</groupId> <artifactId>bcprov-jdk15on</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client-api</artifactId> <version>3.3.5</version> </dependency> </dependencies> </project> ``` - I would say you can't use `org.apache.hive:hive-jdbc:4.0.0` with `classifier` as `standalone`, because `org.apache.hive:hive-jdbc:4.0.0` with `classifier` as `standalone` has class conflicts with ShardingSphere, which was reported in https://issues.apache.org/jira/browse/HIVE-28308 and fixed in https://github.com/apache/hive/pull/5313 . This is a PR on the `apache/hive:4.0.1` milestone released 2 months later. So you have to deal with the huge and impactful dependency of `org.apache.hive:hive-jdbc:4.0.0`. Please note that Hive does not support Hadoop 3.4.x yet. The CI for https://github.com/apache/hive/pull/5187 is still broken. HiveServer2 JDBC Driver uses multiple versions of Hadoop API internally, which is a terrible dependency. - Create several HiveServer2 Docker Containers as shown in https://github.com/apache/shardingsphere/pull/31526 , and then write ShardingSphere configuration files at will. ```yaml mode: type: Standalone repository: type: JDBC dataSources: ds_0: dataSourceClassName: com.zaxxer.hikari.HikariDataSource driverClassName: org.apache.hive.jdbc.HiveDriver jdbcUrl: jdbc:hive2://localhost:10000/demo_ds_0 ds_1: dataSourceClassName: com.zaxxer.hikari.HikariDataSource driverClassName: org.apache.hive.jdbc.HiveDriver jdbcUrl: jdbc:hive2://localhost:10000/demo_ds_1 ds_2: dataSourceClassName: com.zaxxer.hikari.HikariDataSource driverClassName: org.apache.hive.jdbc.HiveDriver jdbcUrl: jdbc:hive2://localhost:10000/demo_ds_2 rules: - !SHARDING tables: t_order: actualDataNodes: keyGenerateStrategy: column: order_id keyGeneratorName: snowflake t_order_item: actualDataNodes: keyGenerateStrategy: column: order_item_id keyGeneratorName: snowflake defaultDatabaseStrategy: standard: shardingColumn: user_id shardingAlgorithmName: inline shardingAlgorithms: inline: type: CLASS_BASED props: strategy: STANDARD algorithmClassName: org.apache.shardingsphere.test.natived.jdbc.commons.algorithm.ClassBasedInlineShardingAlgorithmFixture keyGenerators: snowflake: type: SNOWFLAKE auditors: sharding_key_required_auditor: type: DML_SHARDING_CONDITIONS - !BROADCAST tables: - t_address props: sql-show: false ``` - Feel free to test some Select, Insert, Delete SQL on ShardingSphere JDBC DataSource. Please note that `org.apache.shardingsphere:shardingsphere-parser-sql-hive` has not parsed the SQL for `Create Table`, `Set`, `TRUNCATE TABLE`. Feel free to submit your PR. - The master branch of shardingsphere is currently tested with HiveServer2 JDBC Driver of `apache/hive:3.1.3`. Although you can compile ShadingSphere with JDK11-JDK22 and execute the unit test of Hive integration on ShardingSphere side on JDK8-JDK22, I still say you should take care of yourself. After all, the master branch of Hive can only be compiled with JDK8. # Simple summary - Support for HiveServer2 is a milestone on ShardingSphere 5.5.1 which has not yet been officially released. - `org.apache.shardingsphere.infra.database.hive.metadata.data.loader.HiveMetaDataLoader` has a known TODO. You should either delete this class before compiling the master branch of shardingsphere, or implement the TODO marked by this class. - The dependency management of HiveServer2 JDBC Driver is a disaster. I suggest you test it on ShardingSphere JDBC before testing ShardingSphere Proxy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
