MgjLLL opened a new pull request, #8217: URL: https://github.com/apache/paimon/pull/8217
### Purpose Fix redundant filesystem I/O in `SplitRead` and `FileScanner` when reading schema. `SplitRead` has 3 call sites that unconditionally call `schema_manager.get_schema(schema_id)` even when `schema_id == table.table_schema.id` — the schema is already in memory. This causes unnecessary filesystem reads in the common case (no schema evolution). Java equivalent (`RawFileSplitRead.createFileReader()`) short-circuits with: ```java schemaId == schema.id() ? schema : schemaManager.schema(schemaId) ``` ### Changes - `split_read.py`: Add `_resolve_schema()` method that returns in-memory schema when id matches, replacing 3 direct `get_schema()` calls in `raw_reader_supplier`, `_get_fields_and_predicate`, and `_file_read_fields` - `file_scanner.py`: Add `_schema_fields()` method with same short-circuit pattern for `SimpleStatsEvolutions` ### Tests - Added `file_scanner_schema_fields_test.py` with 3 test cases covering short-circuit, delegation, and zero-id edge case - All existing tests pass (106 passed) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
