JunRuiLee opened a new pull request, #8037: URL: https://github.com/apache/paimon/pull/8037
This PR adds Parquet format support for COPY INTO import and export, as part of #8005. ## Changes **Import** (`COPY INTO table FROM path`): - Read Parquet files with native typed schema (no string-then-cast like CSV/JSON) - Column matching by name (case-insensitive), not by position - Extra source columns are ignored; missing columns become NULL - Cast validation: detects non-null → null after casting (type incompatibility) - Supports explicit column list, PATTERN, FORCE, ON_ERROR = ABORT_STATEMENT **Export** (`COPY INTO path FROM table`): - Write Parquet files via `df.write.parquet()` - COMPRESSION option (SNAPPY, GZIP, NONE, etc.) **Refactoring**: - Extract `resolveDefaultColumn()` shared helper (was duplicated in Parquet and text paths) - Unify `recordHistoryAndBuildResults()` to accept a `countDf` parameter (eliminates ~45 lines of copy-paste between Parquet and text paths) - Add `logWarning` when default value expression parsing fails (was silently swallowed) ## Tests 12 new tests covering: basic import, column name matching, explicit column list, export, export with compression, round-trip, extra fields ignored, missing fields become null, FORCE=FALSE dedup, PATTERN filtering, unsupported option error, rows_loaded count accuracy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
