sfc-gh-mbojanczyk opened a new pull request, #344: URL: https://github.com/apache/arrow-go/pull/344
### Rationale for this change This adds a basic Variant encoder/decoder to start the process of supporting the new [Variant encoding spec](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md) in the Apache Go Parquet library. Variants are useful for efficiently storing and accessing data, especially in things like Iceberg tables. ### What changes are included in this PR? This adds logic to encode and decode Variants, but does not yet plumb that logic through to either Arrow or Parquet. The PR's getting beefy as is, and this seems to be a good standalone unit to get feedback on. Still to implement are the handling of decimal primitives. For ease of implementation, the Metadata keys are only stored in unsorted order. This makes the creation of an encoded Variant simpler as one can serialize data as its being added. For sorted Metadata keys to work, you'd need to buffer data and only create objects at the very end so that the appropriate width of indicies can be chosen. ### Are these changes tested? There are unit tests throughout to test that marshaling produces the expected binary output as per the spec, and to ensure that unmarshaling can spit out the expected values. There are many levels of unit tests, from testing individual marshaling bits to testing the marshaling and unmarshaling of entire Variants. ### Are there any user-facing changes? With this PR, no. This is simply a library to create Variants, but does not plumb the output into Parquet or Arrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
