JingsongLi commented on code in PR #8217:
URL: https://github.com/apache/paimon/pull/8217#discussion_r3418005728


##########
paimon-python/pypaimon/read/scanner/file_scanner.py:
##########
@@ -227,14 +227,19 @@ def __init__(
         # bucket set per ``total_buckets`` value.
         self._bucket_selector = self._init_bucket_selector()
 
-        def schema_fields_func(schema_id: int):
-            return self.table.schema_manager.get_schema(schema_id).fields
-
         self.simple_stats_evolutions = SimpleStatsEvolutions(
-            schema_fields_func,
+            self._schema_fields,

Review Comment:
   This only protects the SimpleStatsEvolutions callback, but a normal scan 
still resolves the file schema while decoding manifest entries before 
_filter_manifest_entry() can use this helper. ManifestFileManager.read() still 
calls table.schema_manager.get_schema(file_dict["_SCHEMA_ID"]).fields for every 
entry, so current-schema files from a REST catalog can still hit the same 403 
before this short-circuit is reached. Please apply the same current-schema 
short-circuit in the manifest decode path as well, or pass a resolver into 
ManifestFileManager.



##########
paimon-python/pypaimon/read/split_read.py:
##########
@@ -167,6 +167,14 @@ def _compute_nested_path_by_name(self) -> 
Optional[Dict[str, List[str]]]:
     def _nested_path_by_name(self) -> Optional[Dict[str, List[str]]]:
         return self._cached_nested_path_by_name
 
+    def _resolve_schema(self, schema_id: int):

Review Comment:
   There is still another direct schema-manager access in 
DataEvolutionSplitRead._create_union_reader(): when a regular file has no 
write_cols, it calls self.table.schema_manager.get_schema(first_file.schema_id) 
to derive field ids. If first_file.schema_id is the current schema id, this 
bypasses _resolve_schema() and can trigger the same REST-catalog 403 on 
data-evolution reads. Please switch that remaining call to 
self._resolve_schema(first_file.schema_id) too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to