[PR] [spark] Add load_csv and export_csv procedures [paimon]

via GitHub Tue, 19 May 2026 02:05:18 -0700


JunRuiLee opened a new pull request, #7898:
URL: https://github.com/apache/paimon/pull/7898


   ### Purpose
   In our scenario, many algorithm engineers work directly with datasets in CSV 
format. This PR adds Spark `load_csv` and `export_csv` procedures to make it 
easy to move data between CSV files and Paimon tables without writing custom 
Spark jobs.
   
     `load_csv` imports CSV files into an existing Paimon table. It matches CSV 
header columns to target table columns by exact name, writes missing columns as 
null, drops extra columns, and always uses Spark CSV `PERMISSIVE` mode so 
malformed rows are counted in `invalid_count` and skipped. Nested columns are 
restored from JSON strings.
   
     `export_csv` exports a Paimon table to a Spark CSV output directory, with 
optional `where` filtering. Nested columns are serialized as JSON strings, and 
`quoteAll=true` is enabled by default so JSON values containing commas are 
quoted correctly. Existing output paths are overwritten.
   
   
   ### Tests
   Added CsvProcedureTest.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [spark] Add load_csv and export_csv procedures [paimon]

Reply via email to