CircArgs opened a new pull request #4016: URL: https://github.com/apache/iceberg/pull/4016
This PR is in contrast to https://github.com/apache/iceberg/pull/3981 which sought to use a pattern similar to the python typings e.g. `List, Dict, Union, etc` to create types that could be used inside iceberg and also server as types that could be statically checked when used to type code. After discussions with @samredai and @rdblue I've revised it further so there is no real metaprogramming yet we still get much of the value. The same syntax as the current code is used to create types: `IntegerType(), StructType( [ NestedField(True, 1, "required_field", StringType()), NestedField(False, 2, "optional_field", IntegerType()), ] )` yet we get `==` for free (no dedicated `__eq__` methods) and can use `isinstance` to check types instead of `issubclass` as was the case in https://github.com/apache/iceberg/pull/3981. Take this example: ```python >>> str(IntegerType()) integer >>> IntegerType() is IntegerType() # same object in memory True >>> repr(BooleanType()) BooleanType() >>> repr(StructType( [ NestedField(True, 1, "required_field", StringType()), NestedField(False, 2, "optional_field", IntegerType()), ] )) StructType(fields=(NestedField(is_optional=True, field_id=1, name='required_field', field_type=StringType(), doc=None), NestedField(is_optional=False, field_id=2, name='optional_field', field_type=IntegerType(), doc=None))) >>> str(StructType( [ NestedField(True, 1, "required_field", StringType()), NestedField(False, 2, "optional_field", IntegerType()), ] )) struct<[nestedfield<True, 1, required_field, string, None>, nestedfield<False, 2, optional_field, integer, None>]> >>> StructType( [ NestedField(True, 1, "required_field", StringType()), NestedField(False, 2, "optional_field", IntegerType()), ] )==StructType( [ NestedField(True, 1, "required_field", StringType()), NestedField(False, 2, "optional_field", IntegerType()), ] ) True >>> StructType( [ NestedField(True, 1, "required_field", StringType()), NestedField(False, 2, "optional_field", IntegerType()), ] )==StructType( [ NestedField(True, 0, "required_field", StringType()), # id changed from 1 to 0 NestedField(False, 2, "optional_field", IntegerType()), ] ) False >>> isinstance(StringType(), StringType) True ``` This `types.py` is about 100 lines less code than the current one with greater functionality as described The centerpiece of this PR is simply the `__new__` method on the base `IcebergType` which checks the attribute `_implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"]` which you can see keeps track of `IcebergType` _instances_ by storing keys to the type's name and attributes (as defined it the init) Thank you to @samredai and @rdblue for helping to inspire this change. Note: if this PR is accepted https://github.com/apache/iceberg/pull/3981 should be closed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
