qinbo78 opened a new issue, #9948:
URL: https://github.com/apache/seatunnel/issues/9948

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   I am using a LocalFile CSV source and a Doris sink in SeaTunnel, and I 
encountered several problems during data synchronization:
   
   csv_use_header_line seems ineffective – even though I specified 
csv_use_header_line = true, it appears to have no effect, and I still need to 
manually define the schema.
   
   Extra 0 at the end of CSV rows – when reading the CSV file, an extra 0 
appears at the end of each row.
   
   Duplicate data in Doris – after synchronization, the data in Doris is 
duplicated or doubled.
   
   ### SeaTunnel Version
   
   2.3.8
   
   ### SeaTunnel Config
   
   ```conf
   env {
     parallelism = 3
     job.mode = "BATCH"
   }
   
   source {
     LocalFile {
       path = "/data/home/conf/test.csv"
       file_format_type = "csv"
       skip_header_row_number = 1
       csv_use_header_line = true
       schema = {
         fields {
           platform = "string"
           platform_order_no = "string"
         }
        }
     }
   }
   
   
   sink {
       Doris {
           fenodes = "${doris.fenodes}"
           username = "${doris.username}"
           password = "${doris.password}"
           database = "${doris.database}"
           table = "test"
           sink.enable-2pc = "true"
           sink.label-prefix = "test_csv"
           sink.enable-delete = "true"
           doris.config = {
             format = "csv"
             column_separator = ","
           }
       }
   }
   ```
   
   ### Running Command
   
   ```shell
   $SEATUNNEL_HOME/bin/seatunnel.sh --config ${config} --variable $secret
   ```
   
   ### Error Exception
   
   ```log
   When I only define csv_use_header_line = true and there is no schema, this 
error is reported.
   
   
   Caused by: 
org.apache.seatunnel.api.configuration.util.OptionValidationException: 
ErrorCode:[API-02], ErrorDescription:[Option item validate failed] - There are 
unconfigured options, the options('schema') are required because 
['file_format_type' == TEXT || 'file_format_type' == JSON || 'file_format_type' 
== EXCEL || 'file_format_type' == CSV || 'file_format_type' == XML] is true.
        at 
org.apache.seatunnel.api.configuration.util.ConfigValidator.validate(ConfigValidator.java:200)
        at 
org.apache.seatunnel.api.configuration.util.ConfigValidator.validate(ConfigValidator.java:107)
        at 
org.apache.seatunnel.api.configuration.util.ConfigValidator.validate(ConfigValidator.java:47)
        at 
org.apache.seatunnel.api.table.factory.FactoryUtil.createAndPrepareSource(FactoryUtil.java:111)
        at 
org.apache.seatunnel.api.table.factory.FactoryUtil.createAndPrepareSource(FactoryUtil.java:74)
        ... 7 more
   ```
   
   ### Zeta or Flink or Spark Version
   
   _No response_
   
   ### Java or Scala Version
   
   _No response_
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to