RexLyc opened a new issue, #10350: URL: https://github.com/apache/seatunnel/issues/10350
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues. ### What happened I'm using seatunnel these days. And I'm wondering if there is a problem that some connectors didn't do nullable check, suck as S3/Sftp/LocalFs. These connectors are designed to read&write from&to a file. In many cases, they work well. But when I misdefine a column name in schema, which I use a wrong name. Things are getting wired. For example, my csv file seems like ``` a,b 1,2 ``` and my schema seems like ``` schema { columns = [ { name = b type = int }, { name = c type = int } ] } ``` I think I should get an error, because c doens't exist. But instead, I get an empty value ``` b,c 2, ``` After I got this, I think I might get error if i set 'nullable' to 'c'. ``` schema { columns = [ { name = b type = int }, { name = c type = int nullable = false } ] } ``` But I still got the result above. So there might be two problems. 1. At least csv format file deserializeSchema class didn't check whether schema is out of fields in real data file. 2. When user provide a wrong schema (p.s. column won't exist) , we should check and report an error. In fact, I'm not sure if this is a feature or a bug. ### SeaTunnel Version 2.3.12 ### SeaTunnel Config ```conf env { job.mode = "BATCH" job.name = "seatunnel_job" parallelism = 1 } source { LocalFile { path = "/home/kpad/connector/data/localfs" file_format_type = "csv" file_filter_pattern = "alice-greater-than" csv_use_header_line = true field_delimiter = "," schema { columns = [ { name = greater_than type = bigint nullable = false }, { name = x type = double nullable = true } ] } } } sink { LocalFile { path = "/home/kpad/var/storage/data" file_format_type = "csv" single_file_mode = true custom_filename = true is_enable_transaction = false enable_header_write = true create_empty_file_when_no_data = true file_name_expression = "greater-than-dataset" filename_extension = ".csv" } } ``` ### Running Command ```shell // I'm using golang to run seatunnel (locally in my docker) s.cmd = exec.Command("seatunnel.sh", "-c", "seatunnel.conf", "-m", "local") ``` ### Error Exception ```log I'm wondering why nullable error didn't appear. ``` ### Zeta or Flink or Spark Version _No response_ ### Java or Scala Version _No response_ ### Screenshots <img width="3420" height="1904" alt="Image" src="https://github.com/user-attachments/assets/44cf312f-87f3-41bf-9719-7b68582b52ce" /> This place didn't check whether this column SHOULD exists. As I said above, user might give a wrong column name that will never appear in real data file. I'm still looking for the `SeaTunnelRow` processing code that should be designed to do nullable check. ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
