tanmayrauth commented on issue #1080:
URL: https://github.com/apache/iceberg-go/issues/1080#issuecomment-4455030135

   PartitionSpec (TruncateTransform) : controls which rows go into which files. 
All rows sharing the same truncated prefix land in the same partition's data 
files. This is what enables file-level pruning during scans.
                                                                                
                                                                                
                                                                                
                                                                               
     SortOrder (IdentityTransform) : controls the physical row ordering within 
each data file. This is what makes Parquet column chunk min/max statistics 
tight within a file, enabling row-group skipping.                               
                                                                                
     
                                                                                
                                                                                
                                                                                
                                                                               
     Best practice for your case:                                               
                                                                                
                                                                                
                                                                               
                     
    ```
    partitionSpec := iceberg.NewPartitionSpec(                                  
                                                                                
                                                                                
                                                                              
         iceberg.PartitionField{                                                
                                                                                
                                                                                
                                                                               
             SourceIDs: []int{2},                                               
                                                                                
                                                                                
                                                                               
             Transform: iceberg.TruncateTransform{Width: 20},                   
                                                                                
                                                                                
                                                                               
             Name:      "project_partition",                                    
                                                                                
                                                                                
                                                                               
         },                                                                     
                                                                                
                                                                                
                                                                               
     )                                                                          
                                                                                
                                                                                
                                                                               
                                                                                
                                                                                
                                                                                
                                                                               
     sortField := table.SortField{                                              
                                                                                
                                                                                
                                                                               
         SourceIDs: []int{2},
         Transform: iceberg.IdentityTransform{},                                
                                                                                
                                                                                
                                                                               
         Direction: table.SortASC,                                              
                                                                                
                                                                                
                                                                               
         NullOrder: table.NullsLast,                                            
                                                                                
                                                                                
                                                                               
     }                                                                          
                                                                                
                                                                                
                                                                               
     sortOrder, err := table.NewSortOrder(table.InitialSortOrderID, 
[]table.SortField{sortField})                                                   
                                                                                
                                                                                
           
                                                                                
                                                                                
                                                                                
                                                                               
     tbl, err := cat.CreateTable(ctx, tableIdent, icebergSchema,                
                                                                                
                                                                                
                                                                               
         catalog.WithPartitionSpec(&partitionSpec),                             
                                                                                
                                                                                
                                                                               
         catalog.WithSortOrder(sortOrder),                                      
                                                                                
                                                                                
                                                                               
     )      
   ```                                                                          
                                                                                
                                                                                
                                                                         
                                                                                
                                                                                
                                                                                
                                                                               
     You don't want TruncateTransform in the sort order — that would sort by 
the truncated value, making all rows within a partition appear equivalently 
ordered (since they already share the same truncated prefix). Instead use 
IdentityTransform{} to sort by the full raw value, giving you the tightest 
possible min/max stats within each file.
                                                                                
                                                                                
                                                                                
                                                                               
     In short: 
   partition spec = which file a row lands in, sort order = row arrangement 
within that file. Use truncate for the former, identity for the latter. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to