Posnet opened a new pull request, #5679:
URL: https://github.com/apache/arrow-rs/pull/5679

   Similar to what is supported in the csv crate, as well as the pandas, 
arrow-cpp and polars crates. A subset of CSV files treat missing columns at the 
end of rows as null (if the schema allows it). This commit adds support to 
optionally enable treating such missing columns as null. The default behavior 
is still to treat an incorrect number of columns as an error.
   
   # Which issue does this PR close?
   
   Closes 5678
    
   
   # What changes are included in this PR?
   
   Added boolean config fields to Format and Decoder to allow for specifying 
whether flexible columns are desired. And provide the ability to set when via 
builder methods.
   The change also passes down the new flexible fields to the rust csv config.
   The change also updates the decoder such that if flexible columns are 
enabled it will pad the offset buffers with empty offsets in order to allow for 
null values to be interpreted at parsing.
   The change also adds unit tests for verify the new behavior.
   
   # Are there any user-facing changes?
   
   Yes, it extends the API surface of the Format struct and ReaderBuilder 
struct. It also adds a set_flexible_lengths method to the RecordDecoder struct 
since adding the field to the `new` constructor would be a public api break. I 
believe the changes are only semantically minor.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to