ErenDursun commented on PR #692: URL: https://github.com/apache/iceberg-go/pull/692#issuecomment-3792418097
Just had a closer look into it and it seems the field `Specs` is a list https://github.com/apache/iceberg-go/blob/3ce3b4bb51c0aa060d2bf6826dbbfd453c8067f9/table/metadata.go#L1210 because it is defined as such in the [Iceberg docs](https://iceberg.apache.org/spec/#table-metadata-fields): > partition-specs: A list of partition specs, stored as full partition spec objects. Changing its type in the `commonMetadata` struct would interfere with the (un)marshalling logic and changing it in the `Metadata` interface would let it diverge from the Iceberg specs. I could instead add an internal field `specsByID map[int]iceberg.PartitionSpec` and an internal method `partitionSpecsByID() map[int]iceberg.PartitionSpec` with lazy-loading logic. That way the methods https://github.com/apache/iceberg-go/blob/3ce3b4bb51c0aa060d2bf6826dbbfd453c8067f9/table/metadata.go#L1316-L1324 and https://github.com/apache/iceberg-go/blob/3ce3b4bb51c0aa060d2bf6826dbbfd453c8067f9/table/metadata.go#L1430-L1439 could use the map for better performance. If we'd also add the new internal lazy-loading method to the [Metadata interface](https://github.com/apache/iceberg-go/blob/3ce3b4bb51c0aa060d2bf6826dbbfd453c8067f9/table/metadata.go#L68) then the method from this PR with the [bugfix](https://github.com/ErenDursun/iceberg-go/blob/6fb11008ef1dd26f24857ab5c7012573116db2ae/table/scanner.go#L245-L257) ``` func (scan *Scan) buildManifestEvaluator(specID int) (func(iceberg.ManifestFile) (bool, error), error) { i := slices.IndexFunc(scan.metadata.PartitionSpecs(), func(spec iceberg.PartitionSpec) bool { return spec.ID() == specID }) if i < 0 { return nil, fmt.Errorf("partition spec with id %d not found in metadata", specID) } spec := scan.metadata.PartitionSpecs()[i] return newManifestEvaluator(spec, scan.metadata.CurrentSchema(), scan.partitionFilters.Get(specID), scan.caseSensitive) } ``` could also work with the map. This method here could also be simplified: https://github.com/apache/iceberg-go/blob/3ce3b4bb51c0aa060d2bf6826dbbfd453c8067f9/table/update_spec.go#L407-L411 Buuuut... I'm not sure if the minor performance boost justifies the additional lazy-loading complexity 😅 there's the risk of future changes to the code leading to diverging states between the partition-spec list and map. Should I give it a try or do we keep the lists and just prevent the out of range errors for now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
