RexLyc opened a new issue, #10350:
URL: https://github.com/apache/seatunnel/issues/10350

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   I'm using seatunnel these days. And I'm wondering if there is a problem that 
some connectors didn't do nullable check, suck as S3/Sftp/LocalFs. These 
connectors are designed to read&write from&to a file. In many cases, they work 
well. But when I misdefine a column name in schema, which I use a wrong name. 
Things are getting wired.
   
   For example, my csv file seems like
   ```
   a,b
   1,2
   ```
   
   and my schema seems like 
   ```
   schema {
       columns = [
           {
               name = b
               type = int
           },
           {
               name = c
               type = int
           }
       ]
   }
   ```
   
   I think I should get an error, because c doens't exist. But instead, I get 
an empty value
   ```
   b,c
   2,
   ```
   
   After I got this, I think I might get error if i set 'nullable' to 'c'.
   ```
   schema {
       columns = [
           {
               name = b
               type = int
           },
           {
               name = c
               type = int
               nullable = false
           }
       ]
   }
   ```
   
   But I still got the result above.
   
   So there might be two problems.
   1. At least csv format file deserializeSchema class didn't check whether 
schema is out of fields in real data file.
   2. When user provide a wrong schema (p.s. column won't exist) , we should 
check and report an error.
   
   In fact, I'm not sure if this is a feature or a bug.
   
   ### SeaTunnel Version
   
   2.3.12
   
   ### SeaTunnel Config
   
   ```conf
   env {
     job.mode = "BATCH"
     job.name = "seatunnel_job"
     parallelism = 1
   }
   
   source {
     LocalFile {
       path = "/home/kpad/connector/data/localfs"
       file_format_type = "csv"
       file_filter_pattern = "alice-greater-than"
   
       
       csv_use_header_line = true
       field_delimiter = ","
       schema {
           columns = [
               
               {
                   name = greater_than
                   type = bigint
                   nullable = false
               },
               {
                   name = x
                   type = double
                   nullable = true
               }
           ]
       }
   
       
   }
   }
   
   
   
   sink {
     LocalFile {
       path = "/home/kpad/var/storage/data"
       file_format_type = "csv"
       single_file_mode = true
       custom_filename = true
       is_enable_transaction = false
       enable_header_write = true
       create_empty_file_when_no_data = true
       file_name_expression = "greater-than-dataset"
       filename_extension = ".csv"
   }
   }
   ```
   
   ### Running Command
   
   ```shell
   // I'm using golang to run seatunnel (locally in my docker)
   
   s.cmd = exec.Command("seatunnel.sh", "-c", "seatunnel.conf", "-m", "local")
   ```
   
   ### Error Exception
   
   ```log
   I'm wondering why nullable error didn't appear.
   ```
   
   ### Zeta or Flink or Spark Version
   
   _No response_
   
   ### Java or Scala Version
   
   _No response_
   
   ### Screenshots
   
   <img width="3420" height="1904" alt="Image" 
src="https://github.com/user-attachments/assets/44cf312f-87f3-41bf-9719-7b68582b52ce";
 />
   This place didn't check whether this column SHOULD exists. As I said above, 
user might give a wrong column name that will never appear in real data file.
   
   I'm still looking for the `SeaTunnelRow` processing code that should be 
designed to do nullable check.
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to