JunRuiLee opened a new pull request, #7898:
URL: https://github.com/apache/paimon/pull/7898
### Purpose
In our scenario, many algorithm engineers work directly with datasets in CSV
format. This PR adds Spark `load_csv` and `export_csv` procedures to make it
easy to move data between CSV files and Paimon tables without writing custom
Spark jobs.
`load_csv` imports CSV files into an existing Paimon table. It matches CSV
header columns to target table columns by exact name, writes missing columns as
null, drops extra columns, and always uses Spark CSV `PERMISSIVE` mode so
malformed rows are counted in `invalid_count` and skipped. Nested columns are
restored from JSON strings.
`export_csv` exports a Paimon table to a Spark CSV output directory, with
optional `where` filtering. Nested columns are serialized as JSON strings, and
`quoteAll=true` is enabled by default so JSON values containing commas are
quoted correctly. Existing output paths are overwritten.
### Tests
Added CsvProcedureTest.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]