MetalBlueberry opened a new pull request, #375:
URL: https://github.com/apache/arrow-go/pull/375

   ### Rationale for this change
   
   I've been looking for a way to convert parquet files to CSV. The arrow 
library seems to be doing a good job, until I found out that.
   
   - Not every type was supported
   - It was impossible to change how types were formatted
   
   My goal is to generate CSV that can be used with `COPY TO` statement in 
postgresql, For example, I needed to make sure binary fields are hex encoded 
instead of base64.
   
   After thinking a bit about it, I considered adding a custom type converted 
option that will allow any user of the csv.Writer to change the behavior for 
specific types. This is just a function that will be called before the standard 
mapping to allow any custom logic to handle the type. 
   
   Tests `TestCustomTypeConversion` in this PR show how it can be used.
   
   ### What changes are included in this PR?
   
   - Option for csv.Writer to set a `CustomTypeConversion`
   - Remove csv.Writer schema type validation. (It will fail on write if type 
is not handled)
   - Add tests for csv.Writer based on apache/parquet_testing 
`alltypes_plain.parquet` and `delta_byte_array.parquet`
   - Add test for `CustomTypeConversion` 
   
   ### Are these changes tested?
   
   Yes
   
   ### Are there any user-facing changes?
   
   - Invalid schemas won't fail on csv.Writer creation but on first write. I 
believe this is easier as it sets a single source of truth for type validation.
   - if CustomTypeConverstion option is not set, there are no changes in the 
behavior. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to