CircArgs opened a new pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016


   This PR is in contrast to https://github.com/apache/iceberg/pull/3981 which 
sought to use a pattern similar to the python typings e.g. `List, Dict, Union, 
etc` to create types that could be used inside iceberg and also server as types 
that could be statically checked when used to type code. 
   
   After discussions with @samredai and @rdblue I've revised it further so 
there is no real metaprogramming yet we still get much of the value.
   
   The same syntax as the current code is used to create types:
   
   `IntegerType(), StructType(
           [
               NestedField(True, 1, "required_field", StringType()),
               NestedField(False, 2, "optional_field", IntegerType()),
           ]
       )`
   
   yet we get `==` for free (no dedicated `__eq__` methods) and can use 
`isinstance` to check types instead of `issubclass` as was the case in 
https://github.com/apache/iceberg/pull/3981. 
   
   Take this example:
   
   ```python
   >>> str(IntegerType())
   integer
   
   >>> IntegerType() is IntegerType() # same object in memory
   True
   
   >>> repr(BooleanType())
   BooleanType()
   
   >>> repr(StructType(
           [
               NestedField(True, 1, "required_field", StringType()),
               NestedField(False, 2, "optional_field", IntegerType()),
           ]
       ))
   StructType(fields=(NestedField(is_optional=True, field_id=1, 
name='required_field', field_type=StringType(), doc=None), 
NestedField(is_optional=False, field_id=2, name='optional_field', 
field_type=IntegerType(), doc=None)))
   
   >>> str(StructType(
           [
               NestedField(True, 1, "required_field", StringType()),
               NestedField(False, 2, "optional_field", IntegerType()),
           ]
       ))
   struct<[nestedfield<True, 1, required_field, string, None>, 
nestedfield<False, 2, optional_field, integer, None>]>
   
   >>> StructType(
           [
               NestedField(True, 1, "required_field", StringType()),
               NestedField(False, 2, "optional_field", IntegerType()),
           ]
       )==StructType(
           [
               NestedField(True, 1, "required_field", StringType()),
               NestedField(False, 2, "optional_field", IntegerType()),
           ]
       )
   True 
   
   >>> StructType(
           [
               NestedField(True, 1, "required_field", StringType()),
               NestedField(False, 2, "optional_field", IntegerType()),
           ]
       )==StructType(
           [
               NestedField(True, 0, "required_field", StringType()), # id 
changed from 1 to 0
               NestedField(False, 2, "optional_field", IntegerType()),
           ]
       ) 
   False 
   
   >>> isinstance(StringType(), StringType)
   True
   ```
   
   This `types.py` is about 100 lines less code than the current one with 
greater functionality as described
   
   The centerpiece of this PR is simply the `__new__` method on the base 
`IcebergType` which checks the attribute `_implemented: Dict[Tuple[str, 
Tuple[Any]], "IcebergType"]` which you can see keeps track of `IcebergType` 
_instances_ by storing keys to the type's name and attributes (as defined it 
the init)
   
   Thank you to @samredai and @rdblue for helping to inspire this change.
   
   Note: if this PR is accepted https://github.com/apache/iceberg/pull/3981 
should be closed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to