[GitHub] [seatunnel] Toroidals opened a new issue, #5435: Export hive data to clickhouse cluster by seatunnel, and the data is always imported to only one clickhouse node.

via GitHub Wed, 06 Sep 2023 00:33:18 -0700


Toroidals opened a new issue, #5435:
URL: https://github.com/apache/seatunnel/issues/5435


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   
   Export hive data to clickhouse cluster by seatunnel, and the data is always 
imported to only one clickhouse node.
   
   ### SeaTunnel Version
   
   seatunnel: apache-seatunnel-2.3.2
   spark: 3.3.1
   clickhouse: 22.8.16.32
   
   ### SeaTunnel Config
   
   ```conf
   env {
     execution.parallelism = 3
     job.mode = "BATCH"
     spark.sql.catalogImplementation = "hive"
     spark.app.name = "seatunnel-hive-to-ck_xxx"
     spark.yarn.queue = "default"
     spark.executor.instances = 16
     spark.executor.cores = 2
     spark.driver.memory = "3g"
     spark.executor.memory = "14g"
     spark.hadoop.hive.exec.dynamic.partition = "true"
     spark.hadoop.hive.exec.dynamic.partition.mode = "nonstrict"
     spark.sql.sources.partitionOverwriteMode = "dynamic"
     spark.executor.extraJavaOptions = "-Dfile.encoding=UTF-8"
     spark.driver.extraJavaOptions = "-Dfile.encoding=UTF-8"
   }
   
   source {
     
     Hive {
       metastore_uri = 
"thrift://xx01:9083,thrift://xx02:9083,thrift://xx03:9083"
       table_name = "dm.xxx"
       result_table_name = "soure_table"
       parallelism = 16
     }
   
   }
   
   transform {
   Sql {
       source_table_name = "soure_table"
       result_table_name = "sink_table"
       query = "select * from soure_table"
   }
   }
   
   sink {
   Clickhouse {
       host = "xx01:8123,xx02:8123,xx03:8123,xx04:8123,xx05:8123"
       database = "dm"
       table = "xxx"
       username = xx
       parallelism = 16
       password = xxxx
       clickhouse.confg = {
         max_rows_to_read = "100"
         read_overflow_mode = "throw"
             bulk_size = 100000
         retry = 3
       }
      }  
   }
   ```
   
   
   ### Running Command
   
   ```shell
   
/usr/local/apache-seatunnel-2.3.2/bin/start-seatunnel-spark-3-connector-v2.sh 
--master yarn --deploy-mode client --config  
/usr/local/apache-seatunnel-2.3.2/config/xxx.conf
   ```
   
   
   ### Error Exception
   
   ```log
   I have tried multiple times to import data into ClickHouse, but each time it 
only writes the data to the first node in the ClickHouse cluster list
   ```
   
   
   ### Zeta or Flink or Spark Version
   
   _No response_
   
   ### Java or Scala Version
   
   _No response_
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [seatunnel] Toroidals opened a new issue, #5435: Export hive data to clickhouse cluster by seatunnel, and the data is always imported to only one clickhouse node.

Reply via email to