Re: [I] How to use shardingsphere-proxy to proxy the hive db [shardingsphere]

via GitHub Mon, 22 Jul 2024 07:38:33 -0700


linghengqian commented on issue #32223:
URL: 
https://github.com/apache/shardingsphere/issues/32223#issuecomment-2243123751


   # TL;DR
   - What I am going to say next may be a bit nonsense, but I do hope that 
friends with Hive knowledge can help deal with the problems encountered by the 
master branch on HiveServer2. No matter it is the master branch of 
ShardingSphere or the master branch of Hive.
   
   - We can leave ShardingSphere Proxy aside and discuss ShardingSphere JDBC 
first, both of which actually require the use of optional modules to parse the 
Hive dialect. First, compile the products of ShardingSpehre into the local 
repository. You need to remove 
`org.apache.shardingsphere.infra.database.hive.metadata.data.loader.HiveMetaDataLoader`
 as in https://github.com/apache/shardingsphere/pull/31526 to prevent the 
creation of an embedded HiveServer2, or try to raise a PR to remove this file 
marked TODO. Apparently this place involves reading a string like 
`thrift://<host_name>:<port>`. Feel free to check 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration
   ```shell
   sdk install java 22.0.1-graalce
   sdk use java 22.0.1-graalce
   
   git clone [email protected]:apache/shardingsphere.git
   cd ./shardingsphere/
   git reset --hard 45b69b4d0d249f31b01ce963d82debb7de751da4
   ./mvnw clean install -Prelease -T1C -DskipTests -Djacoco.skip=true 
-Dcheckstyle.skip=true -Drat.skip=true -Dmaven.javadoc.skip=true
   ```
   - For ShardingSphere JDBC, you may need to introduce the following 
dependencies in the ShardingSphere configuration file to connect to 
HiveServer2. You can refer to 
https://github.com/apache/shardingsphere/pull/31526 .
   ```xml
   <project>
       <dependencies>
          <dependency>
            <groupId>org.apache.shardingsphere</groupId>
            <artifactId>shardingsphere-jdbc</artifactId>
            <version>5.5.1-SNAPSHOT</version>
          </dependency>
          <dependency>
               <groupId>org.apache.shardingsphere</groupId>
               <artifactId>shardingsphere-infra-database-hive</artifactId>
               <version>5.5.1-SNAPSHOT</version>
          </dependency>
          <dependency>
             <groupId>org.apache.shardingsphere</groupId>
             <artifactId>shardingsphere-parser-sql-hive</artifactId>
             <version>5.5.1-SNAPSHOT</version>
          </dependency>
          <dependency>
             <groupId>org.apache.hive</groupId>
             <artifactId>hive-jdbc</artifactId>
             <version>4.0.0</version>
          </dependency>
          <dependency>
             <groupId>org.apache.hive</groupId>
             <artifactId>hive-service</artifactId>
             <version>4.0.0</version>
             <exclusions>
                <exclusion>
                    <groupId>org.apache.logging.log4j</groupId>
                    <artifactId>log4j-slf4j-impl</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-reload4j</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.logging.log4j</groupId>
                    <artifactId>log4j-api</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.antlr</groupId>
                    <artifactId>antlr4-runtime</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.codehaus.janino</groupId>
                    <artifactId>commons-compiler</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.commons</groupId>
                    <artifactId>commons-dbcp2</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>commons-io</groupId>
                    <artifactId>commons-io</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>commons-lang</groupId>
                    <artifactId>commons-lang</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.commons</groupId>
                    <artifactId>commons-pool2</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.codehaus.janino</groupId>
                    <artifactId>janino</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.fasterxml.woodstox</groupId>
                    <artifactId>woodstox-core</artifactId>
                </exclusion>
                <exclusion>
                   <groupId>org.bouncycastle</groupId>
                   <artifactId>bcprov-jdk15on</artifactId>
                </exclusion>
             </exclusions>
          </dependency>
          <dependency>
             <groupId>org.apache.hadoop</groupId>
             <artifactId>hadoop-client-api</artifactId>
             <version>3.3.5</version>
          </dependency>
       </dependencies>
   </project>
   ```
   - I would say you can't use `org.apache.hive:hive-jdbc:4.0.0` with 
`classifier` as `standalone`, because `org.apache.hive:hive-jdbc:4.0.0` with 
`classifier` as `standalone` has class conflicts with ShardingSphere, which was 
reported in https://issues.apache.org/jira/browse/HIVE-28308 and fixed in 
https://github.com/apache/hive/pull/5313 . This is a PR on the 
`apache/hive:4.0.1` milestone released 2 months later. So you have to deal with 
the huge and impactful dependency of `org.apache.hive:hive-jdbc:4.0.0`. Please 
note that Hive does not support Hadoop 3.4.x yet. The CI for 
https://github.com/apache/hive/pull/5187 is still broken. HiveServer2 JDBC 
Driver uses multiple versions of Hadoop API internally, which is a terrible 
dependency.
   - Create several HiveServer2 Docker Containers as shown in 
https://github.com/apache/shardingsphere/pull/31526 , and then write 
ShardingSphere configuration files at will.
   ```yaml
   mode:
     type: Standalone
     repository:
       type: JDBC
   
   dataSources:
     ds_0:
       dataSourceClassName: com.zaxxer.hikari.HikariDataSource
       driverClassName: org.apache.hive.jdbc.HiveDriver
       jdbcUrl: jdbc:hive2://localhost:10000/demo_ds_0
     ds_1:
       dataSourceClassName: com.zaxxer.hikari.HikariDataSource
       driverClassName: org.apache.hive.jdbc.HiveDriver
       jdbcUrl: jdbc:hive2://localhost:10000/demo_ds_1
     ds_2:
       dataSourceClassName: com.zaxxer.hikari.HikariDataSource
       driverClassName: org.apache.hive.jdbc.HiveDriver
       jdbcUrl: jdbc:hive2://localhost:10000/demo_ds_2
   
   rules:
   - !SHARDING
     tables:
       t_order:
         actualDataNodes:
         keyGenerateStrategy:
           column: order_id
           keyGeneratorName: snowflake
       t_order_item:
         actualDataNodes:
         keyGenerateStrategy:
           column: order_item_id
           keyGeneratorName: snowflake
     defaultDatabaseStrategy:
       standard:
         shardingColumn: user_id
         shardingAlgorithmName: inline
     shardingAlgorithms:
       inline:
         type: CLASS_BASED
         props:
           strategy: STANDARD
           algorithmClassName: 
org.apache.shardingsphere.test.natived.jdbc.commons.algorithm.ClassBasedInlineShardingAlgorithmFixture
     keyGenerators:
       snowflake:
         type: SNOWFLAKE
     auditors:
       sharding_key_required_auditor:
         type: DML_SHARDING_CONDITIONS
   
   - !BROADCAST
     tables:
       - t_address
   
   props:
     sql-show: false
   ```
   - Feel free to test some Select, Insert, Delete SQL on ShardingSphere JDBC 
DataSource. Please note that 
`org.apache.shardingsphere:shardingsphere-parser-sql-hive` has not parsed the 
SQL for `Create Table`, `Set`, `TRUNCATE TABLE`. Feel free to submit your PR. 
   - The master branch of shardingsphere is currently tested with HiveServer2 
JDBC Driver of `apache/hive:3.1.3`. Although you can compile ShadingSphere with 
JDK11-JDK22 and execute the unit test of Hive integration on ShardingSphere 
side on JDK8-JDK22, I still say you should take care of yourself. After all, 
the master branch of Hive can only be compiled with JDK8.
   # Simple summary
   - Support for HiveServer2 is a milestone on ShardingSphere 5.5.1 which has 
not yet been officially released.
   - 
`org.apache.shardingsphere.infra.database.hive.metadata.data.loader.HiveMetaDataLoader`
 has a known TODO. You should either delete this class before compiling the 
master branch of shardingsphere, or implement the TODO marked by this class.
   - The dependency management of HiveServer2 JDBC Driver is a disaster. I 
suggest you test it on ShardingSphere JDBC before testing ShardingSphere Proxy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] How to use shardingsphere-proxy to proxy the hive db [shardingsphere]

Reply via email to