hongjunsu commented on issue #32223: URL: https://github.com/apache/shardingsphere/issues/32223#issuecomment-2244084864
> # TL;DR > * What I am going to say next may be a bit nonsense, but I do hope that friends with Hive knowledge can help deal with the problems encountered by the master branch on HiveServer2. No matter it is the master branch of ShardingSphere or the master branch of Hive. > * We can leave ShardingSphere Proxy aside and discuss ShardingSphere JDBC first, both of which actually require the use of optional modules to parse the Hive dialect. First, compile the products of ShardingSpehre into the local repository. You need to remove `org.apache.shardingsphere.infra.database.hive.metadata.data.loader.HiveMetaDataLoader` as in [Add GraalVM Reachability Metadata and corresponding nativeTest for HiveServer2 #31526](https://github.com/apache/shardingsphere/pull/31526) to prevent the creation of an embedded HiveServer2, or try to raise a PR to remove this file marked TODO. Apparently this place involves reading a string like `thrift://<host_name>:<port>`. Feel free to check https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration > > ```shell > sdk install java 22.0.1-graalce > sdk use java 22.0.1-graalce > > git clone [email protected]:apache/shardingsphere.git > cd ./shardingsphere/ > git reset --hard 45b69b4d0d249f31b01ce963d82debb7de751da4 > ./mvnw clean install -Prelease -T1C -DskipTests -Djacoco.skip=true -Dcheckstyle.skip=true -Drat.skip=true -Dmaven.javadoc.skip=true > ``` > > * For ShardingSphere JDBC, you may need to introduce the following dependencies in the ShardingSphere configuration file to connect to HiveServer2. You can refer to [Add GraalVM Reachability Metadata and corresponding nativeTest for HiveServer2 #31526](https://github.com/apache/shardingsphere/pull/31526) . > > ``` > <project> > <dependencies> > <dependency> > <groupId>org.apache.shardingsphere</groupId> > <artifactId>shardingsphere-jdbc</artifactId> > <version>5.5.1-SNAPSHOT</version> > </dependency> > <dependency> > <groupId>org.apache.shardingsphere</groupId> > <artifactId>shardingsphere-infra-database-hive</artifactId> > <version>5.5.1-SNAPSHOT</version> > </dependency> > <dependency> > <groupId>org.apache.shardingsphere</groupId> > <artifactId>shardingsphere-parser-sql-hive</artifactId> > <version>5.5.1-SNAPSHOT</version> > </dependency> > <dependency> > <groupId>org.apache.hive</groupId> > <artifactId>hive-jdbc</artifactId> > <version>4.0.0</version> > </dependency> > <dependency> > <groupId>org.apache.hive</groupId> > <artifactId>hive-service</artifactId> > <version>4.0.0</version> > <exclusions> > <exclusion> > <groupId>org.apache.logging.log4j</groupId> > <artifactId>log4j-slf4j-impl</artifactId> > </exclusion> > <exclusion> > <groupId>org.slf4j</groupId> > <artifactId>slf4j-reload4j</artifactId> > </exclusion> > <exclusion> > <groupId>org.apache.logging.log4j</groupId> > <artifactId>log4j-api</artifactId> > </exclusion> > <exclusion> > <groupId>org.antlr</groupId> > <artifactId>antlr4-runtime</artifactId> > </exclusion> > <exclusion> > <groupId>org.codehaus.janino</groupId> > <artifactId>commons-compiler</artifactId> > </exclusion> > <exclusion> > <groupId>org.apache.commons</groupId> > <artifactId>commons-dbcp2</artifactId> > </exclusion> > <exclusion> > <groupId>commons-io</groupId> > <artifactId>commons-io</artifactId> > </exclusion> > <exclusion> > <groupId>commons-lang</groupId> > <artifactId>commons-lang</artifactId> > </exclusion> > <exclusion> > <groupId>org.apache.commons</groupId> > <artifactId>commons-pool2</artifactId> > </exclusion> > <exclusion> > <groupId>org.codehaus.janino</groupId> > <artifactId>janino</artifactId> > </exclusion> > <exclusion> > <groupId>com.fasterxml.woodstox</groupId> > <artifactId>woodstox-core</artifactId> > </exclusion> > <exclusion> > <groupId>org.bouncycastle</groupId> > <artifactId>bcprov-jdk15on</artifactId> > </exclusion> > </exclusions> > </dependency> > <dependency> > <groupId>org.apache.hadoop</groupId> > <artifactId>hadoop-client-api</artifactId> > <version>3.3.5</version> > </dependency> > </dependencies> > </project> > ``` > > * I would say you can't use `org.apache.hive:hive-jdbc:4.0.0` with `classifier` as `standalone`, because `org.apache.hive:hive-jdbc:4.0.0` with `classifier` as `standalone` has class conflicts with ShardingSphere, which was reported in https://issues.apache.org/jira/browse/HIVE-28308 and fixed in [HIVE-28315: Missing classes while using hive jdbc standalone jar hive#5313](https://github.com/apache/hive/pull/5313) . This is a PR on the `apache/hive:4.0.1` milestone released 2 months later. So you have to deal with the huge and impactful dependency of `org.apache.hive:hive-jdbc:4.0.0`. Please note that Hive does not support Hadoop 3.4.x yet. The CI for [HIVE-28191. Upgrade Hadoop Version to 3.4.0. hive#5187](https://github.com/apache/hive/pull/5187) is still broken. HiveServer2 JDBC Driver uses multiple versions of Hadoop API internally, which is a terrible dependency. > * Create several HiveServer2 Docker Containers as shown in [Add GraalVM Reachability Metadata and corresponding nativeTest for HiveServer2 #31526](https://github.com/apache/shardingsphere/pull/31526) , and then write ShardingSphere configuration files at will. > > ```yaml > mode: > type: Standalone > repository: > type: JDBC > > dataSources: > ds_0: > dataSourceClassName: com.zaxxer.hikari.HikariDataSource > driverClassName: org.apache.hive.jdbc.HiveDriver > jdbcUrl: jdbc:hive2://localhost:10000/demo_ds_0 > ds_1: > dataSourceClassName: com.zaxxer.hikari.HikariDataSource > driverClassName: org.apache.hive.jdbc.HiveDriver > jdbcUrl: jdbc:hive2://localhost:10000/demo_ds_1 > ds_2: > dataSourceClassName: com.zaxxer.hikari.HikariDataSource > driverClassName: org.apache.hive.jdbc.HiveDriver > jdbcUrl: jdbc:hive2://localhost:10000/demo_ds_2 > > rules: > - !SHARDING > tables: > t_order: > actualDataNodes: > keyGenerateStrategy: > column: order_id > keyGeneratorName: snowflake > t_order_item: > actualDataNodes: > keyGenerateStrategy: > column: order_item_id > keyGeneratorName: snowflake > defaultDatabaseStrategy: > standard: > shardingColumn: user_id > shardingAlgorithmName: inline > shardingAlgorithms: > inline: > type: CLASS_BASED > props: > strategy: STANDARD > algorithmClassName: org.apache.shardingsphere.test.natived.jdbc.commons.algorithm.ClassBasedInlineShardingAlgorithmFixture > keyGenerators: > snowflake: > type: SNOWFLAKE > auditors: > sharding_key_required_auditor: > type: DML_SHARDING_CONDITIONS > > - !BROADCAST > tables: > - t_address > > props: > sql-show: false > ``` > > * Feel free to test some Select, Insert, Delete SQL on ShardingSphere JDBC DataSource. Please note that `org.apache.shardingsphere:shardingsphere-parser-sql-hive` has not parsed the SQL for `Create Table`, `Set`, `TRUNCATE TABLE`. Feel free to submit your PR. > * The master branch of shardingsphere is currently tested with HiveServer2 JDBC Driver of `apache/hive:3.1.3`. Although you can compile ShadingSphere with JDK11-JDK22 and execute the unit test of Hive integration on ShardingSphere side on JDK8-JDK22, I still say you should take care of yourself. After all, the master branch of Hive can only be compiled with JDK8. > * I would also suggest you to raise a PR on the hive side to implement the `org.apache.hive.jdbc.HiveDatabaseMetaData#getURL()` involved in https://github.com/apache/hive/blob/rel/release-4.0.0/jdbc/src/java/org/apache/hive/jdbc/HiveDatabaseMetaData.java#L747. Currently, you can only connect to HiveServer2 through HikariCP, and the relevant logic is hard-coded in https://github.com/apache/shardingsphere/blob/aa18c5d581db136475c79b292d4a1019d3dae9bc/infra/common/src/main/java/org/apache/shardingsphere/infra/database/DatabaseTypeEngine.java#L127 > . > > ```java > /** > * Get storage type. > * Similar to apache/hive 4.0.0's `org.apache.hive.jdbc.HiveDatabaseMetaData`, it does not implement {@link java.sql.DatabaseMetaData#getURL()}. > * So use {@link CatalogSwitchableDataSource#getUrl()} and {@link ReflectionUtils#getFieldValue(Object, String)} to try fuzzy matching. > * > * @param dataSource data source > * @return storage type > * @throws SQLWrapperException SQL wrapper exception > */ > public static DatabaseType getStorageType(final DataSource dataSource) { > try (Connection connection = dataSource.getConnection()) { > return DatabaseTypeFactory.get(connection.getMetaData().getURL()); > } catch (final SQLFeatureNotSupportedException sqlFeatureNotSupportedException) { > if (dataSource instanceof CatalogSwitchableDataSource) { > return DatabaseTypeFactory.get(((CatalogSwitchableDataSource) dataSource).getUrl()); > } > if (dataSource.getClass().getName().equals(new HikariDataSourcePoolMetaData().getType())) { > HikariDataSourcePoolFieldMetaData dataSourcePoolFieldMetaData = new HikariDataSourcePoolFieldMetaData(); > String jdbcUrlFieldName = ReflectionUtils.<String>getFieldValue(dataSource, dataSourcePoolFieldMetaData.getJdbcUrlFieldName()) > .orElseThrow(() -> new SQLWrapperException(sqlFeatureNotSupportedException)); > return DatabaseTypeFactory.get(jdbcUrlFieldName); > } > throw new SQLWrapperException(sqlFeatureNotSupportedException); > } catch (final SQLException ex) { > throw new SQLWrapperException(ex); > } > } > ``` > > # Simple summary > * Support for HiveServer2 is a milestone on ShardingSphere 5.5.1 which has not yet been officially released. > * `org.apache.shardingsphere.infra.database.hive.metadata.data.loader.HiveMetaDataLoader` has a known TODO. You should either delete this class before compiling the master branch of shardingsphere, or implement the TODO marked by this class. > * The dependency management of HiveServer2 JDBC Driver is a disaster. I suggest you test it on ShardingSphere JDBC before testing ShardingSphere Proxy. thanks a lot -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
