ErenDursun commented on PR #692:
URL: https://github.com/apache/iceberg-go/pull/692#issuecomment-3792418097

   Just had a closer look into it and it seems the field `Specs` is a list
   
https://github.com/apache/iceberg-go/blob/3ce3b4bb51c0aa060d2bf6826dbbfd453c8067f9/table/metadata.go#L1210
   because it is defined as such in the [Iceberg 
docs](https://iceberg.apache.org/spec/#table-metadata-fields):
   > partition-specs: A list of partition specs, stored as full partition spec 
objects.
   
   Changing its type in the `commonMetadata` struct would interfere with the 
(un)marshalling logic and changing it in the `Metadata` interface would let it 
diverge from the Iceberg specs. I could instead add an internal field 
`specsByID map[int]iceberg.PartitionSpec` and an internal method 
`partitionSpecsByID() map[int]iceberg.PartitionSpec` with lazy-loading logic. 
That way the methods
   
https://github.com/apache/iceberg-go/blob/3ce3b4bb51c0aa060d2bf6826dbbfd453c8067f9/table/metadata.go#L1316-L1324
   and 
   
https://github.com/apache/iceberg-go/blob/3ce3b4bb51c0aa060d2bf6826dbbfd453c8067f9/table/metadata.go#L1430-L1439
   could use the map for better performance.
   
   If we'd also add the new internal lazy-loading method to the [Metadata 
interface](https://github.com/apache/iceberg-go/blob/3ce3b4bb51c0aa060d2bf6826dbbfd453c8067f9/table/metadata.go#L68)
 then the method from this PR with the 
[bugfix](https://github.com/ErenDursun/iceberg-go/blob/6fb11008ef1dd26f24857ab5c7012573116db2ae/table/scanner.go#L245-L257)
   ```
   func (scan *Scan) buildManifestEvaluator(specID int) 
(func(iceberg.ManifestFile) (bool, error), error) {
        i := slices.IndexFunc(scan.metadata.PartitionSpecs(), func(spec 
iceberg.PartitionSpec) bool {
                return spec.ID() == specID
        })
        if i < 0 {
                return nil, fmt.Errorf("partition spec with id %d not found in 
metadata", specID)
        }
   
        spec := scan.metadata.PartitionSpecs()[i]
   
        return newManifestEvaluator(spec, scan.metadata.CurrentSchema(),
                scan.partitionFilters.Get(specID), scan.caseSensitive)
   }
   ```
   could also work with the map. This method here could also be simplified:
   
https://github.com/apache/iceberg-go/blob/3ce3b4bb51c0aa060d2bf6826dbbfd453c8067f9/table/update_spec.go#L407-L411
   
   Buuuut... I'm not sure if the minor performance boost justifies the 
additional lazy-loading complexity 😅 there's the risk of future changes to the 
code leading to diverging states between the partition-spec list and map. 
Should I give it a try or do we keep the lists and just prevent the out of 
range errors for now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to