[PR] [spark] Support JSON format in COPY INTO [paimon]

via GitHub Wed, 27 May 2026 01:43:57 -0700


JunRuiLee opened a new pull request, #7993:
URL: https://github.com/apache/paimon/pull/7993


   ## Summary
   
   - Add JSON format support for `COPY INTO` import and export, alongside 
existing CSV support
   - JSON uses column-name matching (not positional), with options for 
`MULTI_LINE`, `NULL_IF`, `EMPTY_FIELD_AS_NULL`, and `COMPRESSION`
   - CSV-only options (e.g. `FIELD_DELIMITER`, `SKIP_HEADER`) are rejected for 
JSON format with clear error messages
   
   ## Motivation
   
   JSON is a common format for semi-structured data in data lake scenarios. 
Some users have requested JSON support for `COPY INTO` to complement the 
existing CSV capability.
   
   ## Changes
   
   - **Grammar**: Add `JSON` lexer token to `PaimonSqlExtensions.g4`
   - **CopyOptions.scala**: Add `FileFormatType.JSON`, format-specific option 
validation and Spark reader/writer option mapping
   - **CopyIntoTableExec.scala**: JSON reads with column-name schema (vs CSV 
positional `_c0/_c1`), dispatch `.json()` / `.csv()` by format type
   - **CopyIntoLocationExec.scala**: Dispatch export by format type
   - **Documentation**: Updated `sql-write.md` with JSON syntax, options, and 
column mapping semantics
   
   ## Tests
   
   Added 16 JSON test cases covering: basic import, column-name matching, 
multi-line, explicit column list, NULL_IF, export, option validation, 
round-trip (export then import), extra/missing fields handling, malformed data 
abort, bad cast abort, GZIP compression, and date/timestamp column casting.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [spark] Support JSON format in COPY INTO [paimon]

Reply via email to