dramaticlly opened a new issue, #4631:
URL: https://github.com/apache/iceberg/issues/4631

   From both issue 
https://github.com/apache/iceberg/issues/3228#issuecomment-1095661430 and pull 
request https://github.com/apache/iceberg/pull/3407#discussion_r738976688, 
there's some discussion on how do we build the immutable PartitionSpec with 
proper validation against every PartitionField 
   
   - in java we leveraged the builder pattern
   - in python there's no perfect substitution, some proposed to look into  
`PartitionSpec` dunder init.
   
   In this issue, I want to highlight the list of validation we need to enforce 
before we can construct a valid PartitionSpec. 
   
   Considering its attribute, the PartitionSpec has
   - spec_id (int) as unique identifier
   - schema (Schema) as data table schema
   - list of PartitionField
   - lastAssignedPartitionFieldId (int): from offset of 1000, increment 1 for 
each partitionField
   
   In the meantime, the PartitionField is defined by 
   - field_id (int) as unique identifier
   - source_id (int), the source id found in table schema
   - name (str), human-readable identifier
   - transform(Transform), the transformation from source column in data table 
to partition values, include identity, bucket etc
   
   Validation we want to enforce when build the partitionSpec
   - for each partitionField in `__init__` parameter, we want to make sure
     - `findSourceColumn`, source column in table schema can be found by 
referencing its name
     - `checkAndAddPartitionName`,  the new name is neither null nor empty, not 
duplicated
     - `checkForRedundantPartitions`, there's no collision on PartitionField in 
PartitionSpec if we group by tuple of source column id and transform
     - `checkCompatibility`, lastly we need to make sure source column in table 
schema have a type such that its type is not null, and is of iceberg primitive 
type, also the registered transform can transform the given type
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to