sfc-gh-mbojanczyk opened a new pull request, #344:
URL: https://github.com/apache/arrow-go/pull/344

   ### Rationale for this change
   This adds a basic Variant encoder/decoder to start the process of supporting 
the new [Variant encoding 
spec](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md) 
in the Apache Go Parquet library. Variants are useful for efficiently storing 
and accessing data, especially in things like Iceberg tables.
   
   ### What changes are included in this PR?
   This adds logic to encode and decode Variants, but does not yet plumb that 
logic through to either Arrow or Parquet. The PR's getting beefy as is, and 
this seems to be a good standalone unit to get feedback on.
   
   Still to implement are the handling of decimal primitives.
   
   For ease of implementation, the Metadata keys are only stored in unsorted 
order. This makes the creation of an encoded Variant simpler as one can 
serialize data as its being added. For sorted Metadata keys to work, you'd need 
to buffer data and only create objects at the very end so that the appropriate 
width of indicies can be chosen.
   
   ### Are these changes tested?
   There are unit tests throughout to test that marshaling produces the 
expected binary output as per the spec, and to ensure that unmarshaling can 
spit out the expected values. There are many levels of unit tests, from testing 
individual marshaling bits to testing the marshaling and unmarshaling of entire 
Variants.
   
   ### Are there any user-facing changes?
   With this PR, no. This is simply a library to create Variants, but does not 
plumb the output into Parquet or Arrow.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to