zhanggougou opened a new issue, #6089:
URL: https://github.com/apache/seatunnel/issues/6089

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   when  use mmap to improve the speed of gengerate file , the file may 
contains <0x00>.
   ClickHouse local version 22.3.17.13 .this is my ck version. <0x00> will 
cause clickhouse local throw exception
   so,i do some change,by fileChannel.position() to avoid  <0x00> 
   
   
   ### SeaTunnel Version
   
   2.3.1
   
   ### SeaTunnel Config
   
   ```conf
   env {
       spark.app.name = "hive_to_ck_file_online"
       spark.yarn.queue="root.private"
       spark.executor.instances = 20
       spark.executor.cores = 1
       spark.executor.memory = 16g
       spark.sql.catalogImplementation = "hive"
       spark.executor.extraJavaOptions = "-Dfile.encoding=UTF-8"
       spark.driver.extraJavaOptions = "-Dfile.encoding=UTF-8"
       spark.hadoop.hive.exec.dynamic.partition = "true"
       spark.hadoop.hive.exec.dynamic.partition.mode = "nonstrict"
       spark.debug.maxToStringFields = 100000
       spark.speculation=false
       spark.yarn.maxAppAttempts=1
       spark.yarn.max.executor.failures=1
       spark.stage.maxConsecutiveAttempts=1
       spark.blacklist.enabled=false
   }
   source {
       Hive {
         metastore_uri="****"
         table_name="st_site.ch_push_test"
         read_partitions= 
["ds=20231103","ds=20231104","ds=20231105","ds=20231106","ds=20231107","ds=20231108","ds=20231109","ds=20231110","ds=20231111","ds=20231112","ds=20231113","ds=20231114","ds=20231115","ds=20231116","ds=20231117","ds=20231118","ds=20231119","ds=20231120","ds=20231121","ds=20231122","ds=20231123","ds=20231124","ds=20231125","ds=20231126","ds=20231127","ds=20231128","ds=20231129","ds=20231130","ds=20231201","ds=20231202","ds=20231203"]
         parallelism= 1000
       }
   }
   sink{
       ClickhouseFile {
         host = "****"
         database = "db_test"
         table = "t_st_bill_line_weightrange_detail6"
         username = "****"
         password = "****"
         clickhouse_local_path = "/usr/bin/clickhouse local"
         node_pass = [{
           node_address = "****"
           username="****"
           password = "****"
         }
       ]
       }
   }
   ```
   
   
   ### Running Command
   
   ```shell
   ./bin/start-seatunnel-spark-2-connector-v2.sh --master yarn --deploy-mode 
cluster --config ${1}
   ```
   
   
   ### Error Exception
   
   ```log
   ERROR ClickhouseFileSinkWriter: Code: 27. DB::Exception: Cannot parse input: 
expected '\t' at end of stream.: Buffer has gone, cannot extract information 
about what has been parsed.: While executing TabSeparatedRowInputFormat: While 
executing File. (CANNOT_PARSE_INPUT_ASSERTION_FAILED)
   ```
   
   
   ### Zeta or Flink or Spark Version
   
   _No response_
   
   ### Java or Scala Version
   
   _No response_
   
   ### Screenshots
   
   this is the data file:
   <img width="1222" alt="image" 
src="https://github.com/apache/seatunnel/assets/25924003/746bf928-52fd-425c-8b3d-7127a230a593";>
   
   this is my fix,the file will not contains  <0x00>,and can success execute ck 
local:
   <img width="1365" alt="image" 
src="https://github.com/apache/seatunnel/assets/25924003/062dea73-066d-4a69-b854-bdaced6aad64";>
   <img width="1357" alt="image" 
src="https://github.com/apache/seatunnel/assets/25924003/035bacbb-2aff-441d-a778-9e61219084c8";>
   
   
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to