[ 
https://issues.apache.org/jira/browse/ARROW-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510047#comment-17510047
 ] 

Matthew Topol commented on ARROW-15544:
---------------------------------------

[~antoinegelloz] I've put up a fix for this and opened a PR to merge it. If you 
get a chance you can test it yourself by running the following command in your 
module directory.

{code}
go mod edit -replace 
github.com/apache/arrow/go/v8/parquet=github.com/zeroshade/arrow/go/v8/parquet@arrow-15544-origin-schema
{code}

and then rebuilding. It should pull down my branch and use that in place of the 
master. I've added a test case which shows it working for both padded and 
unpadded encodings, but feel free to try it with your test case directly and 
get back to me.

> [Go][Parquet] pqarrow.getOriginSchema error while decoding ARROW:schema
> -----------------------------------------------------------------------
>
>                 Key: ARROW-15544
>                 URL: https://issues.apache.org/jira/browse/ARROW-15544
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Go, Parquet
>    Affects Versions: 7.0.0
>         Environment: go1.17, python3.8
>            Reporter: Antoine Gelloz
>            Assignee: Matthew Topol
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hello !
> This is my first time participating in the open source community as a junior 
> developer and I would like to thank you all for your hard work :)
> While using the new pqarrow package for our project 
> [Metronlab/bow|https://github.com/Metronlab/bow] to read parquet files 
> previously written by Pandas.
> An error is returned by function getOriginSchema if the "ARROW:schema" base64 
> encoded value is ending with padding characters.
> This is caused by the use of the 
> [RawStdEncoding|https://pkg.go.dev/encoding/base64#pkg-variables] type that 
> omits padding characters.
> Is there any reason for using raw encoding instead of standard?
> Here is a repo with a test script to demonstrate the problem: 
> [antoinegelloz/arrowparquet|https://github.com/antoinegelloz/arrowparquet]
> Thank you in advance for your help,
> Antoine Gelloz



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to