rdblue commented on code in PR #6525:
URL: https://github.com/apache/iceberg/pull/6525#discussion_r1067330294


##########
python/pyiceberg/avro/resolver.py:
##########
@@ -109,38 +109,46 @@ def resolve(
 
 
 class SchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Reader]):
-    read_types: Optional[Dict[int, Callable[[Schema], StructProtocol]]]
+    read_types: Dict[int, Type[StructProtocol]]
+    context: List[int]
 
-    def __init__(self, read_types: Optional[Dict[int, Callable[[Schema], 
StructProtocol]]]):
+    def __init__(self, read_types: Dict[int, Type[StructProtocol]] = 
EMPTY_DICT) -> None:
         self.read_types = read_types
+        self.context = []
 
     def schema(self, schema: Schema, expected_schema: Optional[IcebergType], 
result: Reader) -> Reader:
         return result
 
+    def before_field(self, field: NestedField, field_partner: 
Optional[NestedField]) -> None:
+        self.context.append(field.field_id)
+
+    def after_field(self, field: NestedField, field_partner: 
Optional[NestedField]) -> None:
+        self.context.pop()
+
     def struct(self, struct: StructType, expected_struct: 
Optional[IcebergType], field_readers: List[Reader]) -> Reader:
+        # -1 indicates the struct root
+        read_struct_id = self.context[-1] if len(self.context) > 0 else -1
+        struct_callable = self.read_types.get(read_struct_id, Record)

Review Comment:
   There is a need to be able to read a subset of fields when reading a 
manifest list or a manifest file. Right now, we don't use the lower or upper 
bounds, for example, but we still read them into memory rather than skipping 
them. There are also other use cases. A generic record, for example, should be 
able to use the field names from the read schema.
   
   To do those things, this PR needs to support passing the read schema to the 
instance that is instantiated. I don't really care how we do it. It can be by 
inspecting the constructor, assuming that the constructor accepts a schema, 
checking for a `set_read_schema` method, or some other idea. But it needs to be 
done as part of this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to