Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

via GitHub Thu, 21 Nov 2024 01:51:12 -0800


zhdech commented on PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#issuecomment-2490607304


   > Forgot to add, although .xlsx files do not support reading after being 
compressed by gz, .xls does. Can be added later for testing. cc @Hisoka-X 
@zhdech
   When testing. xls locally, it prompts that it is not supported
   
![image](https://github.com/user-attachments/assets/e40d3da6-4e6a-4d96-8a37-ac3d5a1b03ce)
   `env {
     parallelism = 1
     job.mode = "BATCH"
     # You can set spark configuration here
     spark.app.name = "SeaTunnel"
     spark.executor.instances = 2
     spark.executor.cores = 1
     spark.executor.memory = "1g"
     spark.master = local
     job.mode = "BATCH"
   }
   
   source {
     LocalFile {
       path = "/seatunnel/read/gz/excel/single/e2e-xls-gz.xls.gz"
       result_table_name = "fake"
       file_format_type = excel
       archive_compress_codec = "gz"
       field_delimiter = ;
       skip_header_row_number = 1
       schema = {
         fields {
           c_map = "map<string, string>"
           c_array = "array<int>"
           c_string = string
           c_boolean = boolean
           c_tinyint = tinyint
           c_smallint = smallint
           c_int = int
           c_bigint = bigint
           c_float = float
           c_double = double
           c_bytes = bytes
           c_date = date
           c_decimal = "decimal(38, 18)"
           c_timestamp = timestamp
           c_row = {
             c_map = "map<string, string>"
             c_array = "array<int>"
             c_string = string
             c_boolean = boolean
             c_tinyint = tinyint
             c_smallint = smallint
             c_int = int
             c_bigint = bigint
             c_float = float
             c_double = double
             c_bytes = bytes
             c_date = date
             c_decimal = "decimal(38, 18)"
             c_timestamp = timestamp
           }
         }
       }
     }
   }
   
   sink {
     Assert {
       rules {
         row_rules = [
           {
             rule_type = MAX_ROW
             rule_value = 5
           },
           {
             rule_type = MIN_ROW
             rule_value = 5
           }
         ],
         field_rules = [
           {
             field_name = c_string
             field_type = string
             field_value = [
               {
                 rule_type = NOT_NULL
               }
             ]
           },
           {
             field_name = c_boolean
             field_type = boolean
             field_value = [
               {
                 rule_type = NOT_NULL
               }
             ]
           },
           {
             field_name = c_double
             field_type = double
             field_value = [
               {
                 rule_type = NOT_NULL
               }
             ]
           }
         ]
       }
     }
   }
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

Reply via email to