Re: [I] How to use shardingsphere-proxy to proxy the hive db [shardingsphere]

via GitHub Mon, 22 Jul 2024 18:33:43 -0700


hongjunsu commented on issue #32223:
URL: 
https://github.com/apache/shardingsphere/issues/32223#issuecomment-2244084864


   > # TL;DR
   > * What I am going to say next may be a bit nonsense, but I do hope that 
friends with Hive knowledge can help deal with the problems encountered by the 
master branch on HiveServer2. No matter it is the master branch of 
ShardingSphere or the master branch of Hive.
   > * We can leave ShardingSphere Proxy aside and discuss ShardingSphere JDBC 
first, both of which actually require the use of optional modules to parse the 
Hive dialect. First, compile the products of ShardingSpehre into the local 
repository. You need to remove 
`org.apache.shardingsphere.infra.database.hive.metadata.data.loader.HiveMetaDataLoader`
 as in [Add GraalVM Reachability Metadata and corresponding nativeTest for 
HiveServer2 #31526](https://github.com/apache/shardingsphere/pull/31526) to 
prevent the creation of an embedded HiveServer2, or try to raise a PR to remove 
this file marked TODO. Apparently this place involves reading a string like 
`thrift://<host_name>:<port>`. Feel free to check 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration
   > 
   > ```shell
   > sdk install java 22.0.1-graalce
   > sdk use java 22.0.1-graalce
   > 
   > git clone [email protected]:apache/shardingsphere.git
   > cd ./shardingsphere/
   > git reset --hard 45b69b4d0d249f31b01ce963d82debb7de751da4
   > ./mvnw clean install -Prelease -T1C -DskipTests -Djacoco.skip=true 
-Dcheckstyle.skip=true -Drat.skip=true -Dmaven.javadoc.skip=true
   > ```
   > 
   > * For ShardingSphere JDBC, you may need to introduce the following 
dependencies in the ShardingSphere configuration file to connect to 
HiveServer2. You can refer to [Add GraalVM Reachability Metadata and 
corresponding nativeTest for HiveServer2 
#31526](https://github.com/apache/shardingsphere/pull/31526) .
   > 
   > ```
   > <project>
   >     <dependencies>
   >        <dependency>
   >          <groupId>org.apache.shardingsphere</groupId>
   >          <artifactId>shardingsphere-jdbc</artifactId>
   >          <version>5.5.1-SNAPSHOT</version>
   >        </dependency>
   >        <dependency>
   >             <groupId>org.apache.shardingsphere</groupId>
   >             <artifactId>shardingsphere-infra-database-hive</artifactId>
   >             <version>5.5.1-SNAPSHOT</version>
   >        </dependency>
   >        <dependency>
   >           <groupId>org.apache.shardingsphere</groupId>
   >           <artifactId>shardingsphere-parser-sql-hive</artifactId>
   >           <version>5.5.1-SNAPSHOT</version>
   >        </dependency>
   >        <dependency>
   >           <groupId>org.apache.hive</groupId>
   >           <artifactId>hive-jdbc</artifactId>
   >           <version>4.0.0</version>
   >        </dependency>
   >        <dependency>
   >           <groupId>org.apache.hive</groupId>
   >           <artifactId>hive-service</artifactId>
   >           <version>4.0.0</version>
   >           <exclusions>
   >              <exclusion>
   >                  <groupId>org.apache.logging.log4j</groupId>
   >                  <artifactId>log4j-slf4j-impl</artifactId>
   >              </exclusion>
   >              <exclusion>
   >                  <groupId>org.slf4j</groupId>
   >                  <artifactId>slf4j-reload4j</artifactId>
   >              </exclusion>
   >              <exclusion>
   >                  <groupId>org.apache.logging.log4j</groupId>
   >                  <artifactId>log4j-api</artifactId>
   >              </exclusion>
   >              <exclusion>
   >                  <groupId>org.antlr</groupId>
   >                  <artifactId>antlr4-runtime</artifactId>
   >              </exclusion>
   >              <exclusion>
   >                  <groupId>org.codehaus.janino</groupId>
   >                  <artifactId>commons-compiler</artifactId>
   >              </exclusion>
   >              <exclusion>
   >                  <groupId>org.apache.commons</groupId>
   >                  <artifactId>commons-dbcp2</artifactId>
   >              </exclusion>
   >              <exclusion>
   >                  <groupId>commons-io</groupId>
   >                  <artifactId>commons-io</artifactId>
   >              </exclusion>
   >              <exclusion>
   >                  <groupId>commons-lang</groupId>
   >                  <artifactId>commons-lang</artifactId>
   >              </exclusion>
   >              <exclusion>
   >                  <groupId>org.apache.commons</groupId>
   >                  <artifactId>commons-pool2</artifactId>
   >              </exclusion>
   >              <exclusion>
   >                  <groupId>org.codehaus.janino</groupId>
   >                  <artifactId>janino</artifactId>
   >              </exclusion>
   >              <exclusion>
   >                  <groupId>com.fasterxml.woodstox</groupId>
   >                  <artifactId>woodstox-core</artifactId>
   >              </exclusion>
   >              <exclusion>
   >                 <groupId>org.bouncycastle</groupId>
   >                 <artifactId>bcprov-jdk15on</artifactId>
   >              </exclusion>
   >           </exclusions>
   >        </dependency>
   >        <dependency>
   >           <groupId>org.apache.hadoop</groupId>
   >           <artifactId>hadoop-client-api</artifactId>
   >           <version>3.3.5</version>
   >        </dependency>
   >     </dependencies>
   > </project>
   > ```
   > 
   > * I would say you can't use `org.apache.hive:hive-jdbc:4.0.0` with 
`classifier` as `standalone`, because `org.apache.hive:hive-jdbc:4.0.0` with 
`classifier` as `standalone` has class conflicts with ShardingSphere, which was 
reported in https://issues.apache.org/jira/browse/HIVE-28308 and fixed in 
[HIVE-28315: Missing classes while using hive jdbc standalone jar 
hive#5313](https://github.com/apache/hive/pull/5313) . This is a PR on the 
`apache/hive:4.0.1` milestone released 2 months later. So you have to deal with 
the huge and impactful dependency of `org.apache.hive:hive-jdbc:4.0.0`. Please 
note that Hive does not support Hadoop 3.4.x yet. The CI for [HIVE-28191. 
Upgrade Hadoop Version to 3.4.0. 
hive#5187](https://github.com/apache/hive/pull/5187) is still broken. 
HiveServer2 JDBC Driver uses multiple versions of Hadoop API internally, which 
is a terrible dependency.
   > * Create several HiveServer2 Docker Containers as shown in [Add GraalVM 
Reachability Metadata and corresponding nativeTest for HiveServer2 
#31526](https://github.com/apache/shardingsphere/pull/31526) , and then write 
ShardingSphere configuration files at will.
   > 
   > ```yaml
   > mode:
   >   type: Standalone
   >   repository:
   >     type: JDBC
   > 
   > dataSources:
   >   ds_0:
   >     dataSourceClassName: com.zaxxer.hikari.HikariDataSource
   >     driverClassName: org.apache.hive.jdbc.HiveDriver
   >     jdbcUrl: jdbc:hive2://localhost:10000/demo_ds_0
   >   ds_1:
   >     dataSourceClassName: com.zaxxer.hikari.HikariDataSource
   >     driverClassName: org.apache.hive.jdbc.HiveDriver
   >     jdbcUrl: jdbc:hive2://localhost:10000/demo_ds_1
   >   ds_2:
   >     dataSourceClassName: com.zaxxer.hikari.HikariDataSource
   >     driverClassName: org.apache.hive.jdbc.HiveDriver
   >     jdbcUrl: jdbc:hive2://localhost:10000/demo_ds_2
   > 
   > rules:
   > - !SHARDING
   >   tables:
   >     t_order:
   >       actualDataNodes:
   >       keyGenerateStrategy:
   >         column: order_id
   >         keyGeneratorName: snowflake
   >     t_order_item:
   >       actualDataNodes:
   >       keyGenerateStrategy:
   >         column: order_item_id
   >         keyGeneratorName: snowflake
   >   defaultDatabaseStrategy:
   >     standard:
   >       shardingColumn: user_id
   >       shardingAlgorithmName: inline
   >   shardingAlgorithms:
   >     inline:
   >       type: CLASS_BASED
   >       props:
   >         strategy: STANDARD
   >         algorithmClassName: 
org.apache.shardingsphere.test.natived.jdbc.commons.algorithm.ClassBasedInlineShardingAlgorithmFixture
   >   keyGenerators:
   >     snowflake:
   >       type: SNOWFLAKE
   >   auditors:
   >     sharding_key_required_auditor:
   >       type: DML_SHARDING_CONDITIONS
   > 
   > - !BROADCAST
   >   tables:
   >     - t_address
   > 
   > props:
   >   sql-show: false
   > ```
   > 
   > * Feel free to test some Select, Insert, Delete SQL on ShardingSphere JDBC 
DataSource. Please note that 
`org.apache.shardingsphere:shardingsphere-parser-sql-hive` has not parsed the 
SQL for `Create Table`, `Set`, `TRUNCATE TABLE`. Feel free to submit your PR.
   > * The master branch of shardingsphere is currently tested with HiveServer2 
JDBC Driver of `apache/hive:3.1.3`. Although you can compile ShadingSphere with 
JDK11-JDK22 and execute the unit test of Hive integration on ShardingSphere 
side on JDK8-JDK22, I still say you should take care of yourself. After all, 
the master branch of Hive can only be compiled with JDK8.
   > * I would also suggest you to raise a PR on the hive side to implement the 
`org.apache.hive.jdbc.HiveDatabaseMetaData#getURL()` involved in 
https://github.com/apache/hive/blob/rel/release-4.0.0/jdbc/src/java/org/apache/hive/jdbc/HiveDatabaseMetaData.java#L747.
 Currently, you can only connect to HiveServer2 through HikariCP, and the 
relevant logic is hard-coded in 
https://github.com/apache/shardingsphere/blob/aa18c5d581db136475c79b292d4a1019d3dae9bc/infra/common/src/main/java/org/apache/shardingsphere/infra/database/DatabaseTypeEngine.java#L127
   >   .
   > 
   > ```java
   > /**
   >      * Get storage type.
   >      * Similar to apache/hive 4.0.0's 
`org.apache.hive.jdbc.HiveDatabaseMetaData`, it does not implement {@link 
java.sql.DatabaseMetaData#getURL()}.
   >      * So use {@link CatalogSwitchableDataSource#getUrl()} and {@link 
ReflectionUtils#getFieldValue(Object, String)} to try fuzzy matching.
   >      *
   >      * @param dataSource data source
   >      * @return storage type
   >      * @throws SQLWrapperException SQL wrapper exception
   >      */
   >     public static DatabaseType getStorageType(final DataSource dataSource) 
{
   >         try (Connection connection = dataSource.getConnection()) {
   >             return 
DatabaseTypeFactory.get(connection.getMetaData().getURL());
   >         } catch (final SQLFeatureNotSupportedException 
sqlFeatureNotSupportedException) {
   >             if (dataSource instanceof CatalogSwitchableDataSource) {
   >                 return 
DatabaseTypeFactory.get(((CatalogSwitchableDataSource) dataSource).getUrl());
   >             }
   >             if (dataSource.getClass().getName().equals(new 
HikariDataSourcePoolMetaData().getType())) {
   >                 HikariDataSourcePoolFieldMetaData 
dataSourcePoolFieldMetaData = new HikariDataSourcePoolFieldMetaData();
   >                 String jdbcUrlFieldName = 
ReflectionUtils.<String>getFieldValue(dataSource, 
dataSourcePoolFieldMetaData.getJdbcUrlFieldName())
   >                         .orElseThrow(() -> new 
SQLWrapperException(sqlFeatureNotSupportedException));
   >                 return DatabaseTypeFactory.get(jdbcUrlFieldName);
   >             }
   >             throw new SQLWrapperException(sqlFeatureNotSupportedException);
   >         } catch (final SQLException ex) {
   >             throw new SQLWrapperException(ex);
   >         }
   >     }
   > ```
   > 
   > # Simple summary
   > * Support for HiveServer2 is a milestone on ShardingSphere 5.5.1 which has 
not yet been officially released.
   > * 
`org.apache.shardingsphere.infra.database.hive.metadata.data.loader.HiveMetaDataLoader`
 has a known TODO. You should either delete this class before compiling the 
master branch of shardingsphere, or implement the TODO marked by this class.
   > * The dependency management of HiveServer2 JDBC Driver is a disaster. I 
suggest you test it on ShardingSphere JDBC before testing ShardingSphere Proxy.
   
   thanks a lot 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] How to use shardingsphere-proxy to proxy the hive db [shardingsphere]

Reply via email to