danny0405 opened a new pull request, #18741:
URL: https://github.com/apache/hudi/pull/18741

   ### Describe the issue this Pull Request addresses
   
   Flink currently does not have an append-only Lance base-file path for tables 
without primary keys. Lance support needs Flink-specific RowData writer and 
reader plumbing, table-format validation, and COW input-format handling so 
pk-less append-only tables can ingest and read Lance base files.
   
   This PR adds Lance support for Flink COPY_ON_WRITE append-only INSERT tables 
without primary keys. It explicitly rejects unsupported Lance combinations such 
as primary-key tables, MERGE_ON_READ tables, and non-INSERT write operations. 
This PR introduces user-visible storage-format support for Flink under those 
constraints.
   
   ### Summary and Changelog
   
   Flink can now write and read Lance base files for append-only COW tables 
without primary keys, including projected reads from Lance files.
   
   #### Working tree: Add Flink Lance append-only writer/reader support
   - Added `HoodieRowDataLanceWriter`, `HoodieFlinkLanceArrowUtils`, and 
`HoodieBloomFilterStringWriteSupport` for primitive RowData-to-Arrow Lance 
writes.
   - Updated `HoodieRowDataCreateHandle` and `HoodieRowDataFileWriterFactory` 
to dispatch writers by base-file extension and create Lance writers.
   - Added `HoodieRowDataLanceReader` and wired it through 
`HoodieRowDataFileReaderFactory`, `FlinkRowDataReaderContext`, and 
`CopyOnWriteInputFormat`.
   - Updated Lance reader resource ownership so iterator close releases Arrow 
reader, Lance reader, allocator, and parent metadata reader resources.
   - Reordered Lance reader vectors by requested field name so projected reads 
like `select name, uuid` return columns in Flink projection order.
   - Marked Lance files unsplittable in `CopyOnWriteInputFormat`.
   - Added `StreamerUtil.getLanceWriteConfig(...)` and persisted 
`hoodie.table.base.file.format` during table initialization.
   - Updated `HoodieTableFactory` validation to allow Lance only for 
append-only COPY_ON_WRITE INSERT tables without primary keys.
   - Replaced the old Flink Lance rejection IT with 
`ITTestHoodieDataSource#testLanceFormatAppendOnlyWriteAndRead`.
   - Added catalog/table-factory tests for append-only Lance table creation and 
unsupported keyed, MOR, and non-INSERT Lance writes.
   
   ### Impact
   
   This enables a new Flink write/read path for Lance base files, scoped to 
COPY_ON_WRITE append-only tables without primary keys. Existing Parquet 
behavior is preserved, with writer creation now dispatched by file extension 
instead of always creating a Parquet writer.
   
   Unsupported Lance table shapes fail early with explicit validation messages. 
Complex/nested logical types are not supported by the Flink Lance Arrow 
conversion helpers in this change; unsupported types throw 
`HoodieNotSupportedException`.
   
   ### Risk Level
   
   medium
   
   This touches storage-format writer/reader paths, Flink table validation, COW 
input-format reading, and table initialization. The main risks are projection 
correctness, resource cleanup, and accidental enablement for unsupported table 
modes. Mitigation includes focused compile and integration/unit coverage:
   
   - `mvn -pl hudi-flink-datasource/hudi-flink -am -DskipTests -DskipITs 
-Dscala-2.12 compile`
   - `mvn -pl hudi-flink-datasource/hudi-flink -am -Dscala-2.12 
-Dtest=TestHoodieTableFactory#testLanceFormatSupportedForAppendOnlyTables,org.apache.hudi.table.catalog.TestHoodieCatalog#testCreateAppendOnlyLanceTableWithoutPrimaryKey,ITTestHoodieDataSource#testLanceFormatAppendOnlyWriteAndRead
 -Dsurefire.failIfNoSpecifiedTests=false test`
   
   The focused test run passed with `Tests run: 3, Failures: 0, Errors: 0`.
   
   ### Documentation Update
   
   Documentation update is recommended because this adds new user-facing Flink 
Lance support with important constraints: only COPY_ON_WRITE append-only INSERT 
tables without primary keys are supported, and primitive columns are supported 
by the current RowData/Arrow conversion path.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to