[I] [Bug][python] SplitRead redundantly reads schema from filesystem when current schema is already in memory [paimon]

via GitHub Thu, 11 Jun 2026 23:41:05 -0700


MgjLLL opened a new issue, #8216:
URL: https://github.com/apache/paimon/issues/8216


   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Paimon version
   
   master (latest)
   
   ### Compute Engine
   
   PythonAPI
   
   ### Minimal reproduce step
   
   **Minimal reproduce step:**
   ```markdown
   1. Create a Paimon table and write data
   2. Use pypaimon to read data via `SplitRead`
   3. Observe that `schema_manager.get_schema()` is called even when 
`schema_id` matches the current table schema id
   4. This triggers redundant filesystem reads for schema files that are 
already available in memory
   
   ### What doesn't meet your expectations?
   
   When `schema_id == table.table_schema.id`, the Python read path should 
return the in-memory `table.table_schema` directly without filesystem access, 
matching the Java short-circuit pattern in 
`RawFileSplitRead.createFileReader()`.
   
   ### Anything else?
   
   This is a companion fix to `FileScanner._schema_fields` which had the same 
redundant read pattern. Both share the root cause: missing short-circuit for 
the current table schema id.
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Bug][python] SplitRead redundantly reads schema from filesystem when current schema is already in memory [paimon]

Reply via email to