JingsongLi commented on code in PR #8217:
URL: https://github.com/apache/paimon/pull/8217#discussion_r3418005728
##########
paimon-python/pypaimon/read/scanner/file_scanner.py:
##########
@@ -227,14 +227,19 @@ def __init__(
# bucket set per ``total_buckets`` value.
self._bucket_selector = self._init_bucket_selector()
- def schema_fields_func(schema_id: int):
- return self.table.schema_manager.get_schema(schema_id).fields
-
self.simple_stats_evolutions = SimpleStatsEvolutions(
- schema_fields_func,
+ self._schema_fields,
Review Comment:
This only protects the SimpleStatsEvolutions callback, but a normal scan
still resolves the file schema while decoding manifest entries before
_filter_manifest_entry() can use this helper. ManifestFileManager.read() still
calls table.schema_manager.get_schema(file_dict["_SCHEMA_ID"]).fields for every
entry, so current-schema files from a REST catalog can still hit the same 403
before this short-circuit is reached. Please apply the same current-schema
short-circuit in the manifest decode path as well, or pass a resolver into
ManifestFileManager.
##########
paimon-python/pypaimon/read/split_read.py:
##########
@@ -167,6 +167,14 @@ def _compute_nested_path_by_name(self) ->
Optional[Dict[str, List[str]]]:
def _nested_path_by_name(self) -> Optional[Dict[str, List[str]]]:
return self._cached_nested_path_by_name
+ def _resolve_schema(self, schema_id: int):
Review Comment:
There is still another direct schema-manager access in
DataEvolutionSplitRead._create_union_reader(): when a regular file has no
write_cols, it calls self.table.schema_manager.get_schema(first_file.schema_id)
to derive field ids. If first_file.schema_id is the current schema id, this
bypasses _resolve_schema() and can trigger the same REST-catalog 403 on
data-evolution reads. Please switch that remaining call to
self._resolve_schema(first_file.schema_id) too.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]