danny0405 opened a new pull request, #18862:
URL: https://github.com/apache/hudi/pull/18862

   ### Describe the issue this Pull Request addresses
   
   Flink Lance base-file support was previously scoped to append-only 
COPY_ON_WRITE tables without primary keys. That prevented keyed COW tables and 
upsert workloads from using `hoodie.table.base.file.format=LANCE`, even though 
the Flink RowData Lance writer and reader path can participate in the generic 
COW write/merge flow.
   
   The Flink Lance reader also did not implement `filterRowKeys`, which is 
needed by keyed COW/index-related paths that need to find candidate record keys 
in existing base files.
   
   ### Summary and Changelog
   
   This PR extends Flink Lance support from append-only COW tables to keyed 
COPY_ON_WRITE tables, while keeping MOR and schema evolution unsupported.
   
   #### Commit 1: feat: add lance format support for Flink COW table 
(`43004d1b7ed`)
   - Relaxed `HoodieTableFactory` Lance validation so Flink Lance is scoped to 
COPY_ON_WRITE tables rather than append-only/no-primary-key tables.
   - Kept validation errors for MERGE_ON_READ Lance tables and schema evolution 
with Lance base files.
   - Implemented `HoodieRowDataLanceReader.filterRowKeys` by scanning projected 
record keys and returning matching key/position pairs.
   - Updated `HoodieRowDataLanceWriter` documentation from append-only base 
files to general Flink RowData base files.
   - Updated `TestHoodieTableFactory` to allow Lance COW tables with primary 
keys, upsert operation, and explicit record-key fields.
   - Added 
`ITTestHoodieDataSource.testLanceFormatCopyOnWriteUpsertWriteAndRead` to write 
initial records, apply updates, and read merged results from a Lance COW table.
   
   ### Impact
   
   Flink users can now configure Lance as the base file format for 
COPY_ON_WRITE tables that use primary keys and upsert writes. No new config 
keys or public APIs are added.
   
   Compatibility constraints remain: Flink Lance support is still rejected for 
MERGE_ON_READ tables and schema evolution. The change affects Flink table 
factory validation and Lance row-key filtering in the Flink reader path.
   
   ### Risk Level
   
   medium
   
   This touches storage-format behavior on the Flink COW read/write path and 
enables a previously rejected keyed/upsert mode. Risk is mitigated by targeted 
factory validation coverage and a batch SQL integration test for Lance COW 
upsert write/read behavior.
   
   Validation evidence:
   - `git diff --check` passed.
   - Attempted `mvn -pl hudi-flink-datasource/hudi-flink -am -DskipITs 
-DskipIT=true -Dtest=TestHoodieTableFactory 
-Dsurefire.failIfNoSpecifiedTests=false test`, but the reactor failed before 
reaching `hudi-flink` due to an unrelated existing compile error in 
`hudi-flink-client`: missing `DataTypeAdapter.variantParquetAnnotation()` 
referenced by `ParquetSchemaConverter.java`.
   
   ### Documentation Update
   
   none
   
   No new configuration is introduced. Existing Lance base-file-format 
documentation may need a follow-up update if the engine support matrix 
documents Flink append-only versus COW behavior.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to