[jira] [Updated] (ARROW-14432) created_by is not exposed in the python wrapper, creating reader side issue.

Kevin (Jira) Thu, 21 Oct 2021 17:59:04 -0700


     [ 
https://issues.apache.org/jira/browse/ARROW-14432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kevin updated ARROW-14432:
--------------------------
    Description: 
 

Current python wrapper does NOT expose

created_by builder  (when writing parquet on disk)

[https://github.com/apache/arrow/blob/master/python/pyarrow/_parquet.pxd#L361]

 

But, this is available in CPP version:

[https://github.com/apache/arrow/blob/4591d76fce2846a29dac33bf01e9ba0337b118e9/cpp/src/parquet/properties.h#L249]

[https://github.com/apache/arrow/blob/master/python/pyarrow/_parquet.pxd#L320]

 

This creates an issue when Hadoop parquet reader reads this pyarrow parquet 
file:

SO :

[https://stackoverflow.com/questions/69658140/how-to-save-a-parquet-with-pandas-using-same-header-than-hadoop-spark-parquet?noredirect=1#comment123131862_69658140]

 

Deelopment should be minimal

 

 

 

 

 

  was:
 

Current python wrapper does NOT expose

created_by

[https://github.com/apache/arrow/blob/master/python/pyarrow/_parquet.pxd#L361]

 

But, this is available in CPP version:

[https://github.com/apache/arrow/blob/4591d76fce2846a29dac33bf01e9ba0337b118e9/cpp/src/parquet/properties.h#L249]

[https://github.com/apache/arrow/blob/master/python/pyarrow/_parquet.pxd#L320]

 

This creates an issue when Hadoop parquet reader reads this pyarrow parquet 
file:

SO :

[https://stackoverflow.com/questions/69658140/how-to-save-a-parquet-with-pandas-using-same-header-than-hadoop-spark-parquet?noredirect=1#comment123131862_69658140]

 

Deelopment should be minimal

 

 

 

 

 


> created_by is not exposed in the python wrapper, creating reader side issue.
> ----------------------------------------------------------------------------
>
>                 Key: ARROW-14432
>                 URL: https://issues.apache.org/jira/browse/ARROW-14432
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Kevin
>            Priority: Major
>
>  
> Current python wrapper does NOT expose
> created_by builder  (when writing parquet on disk)
> [https://github.com/apache/arrow/blob/master/python/pyarrow/_parquet.pxd#L361]
>  
> But, this is available in CPP version:
> [https://github.com/apache/arrow/blob/4591d76fce2846a29dac33bf01e9ba0337b118e9/cpp/src/parquet/properties.h#L249]
> [https://github.com/apache/arrow/blob/master/python/pyarrow/_parquet.pxd#L320]
>  
> This creates an issue when Hadoop parquet reader reads this pyarrow parquet 
> file:
> SO :
> [https://stackoverflow.com/questions/69658140/how-to-save-a-parquet-with-pandas-using-same-header-than-hadoop-spark-parquet?noredirect=1#comment123131862_69658140]
>  
> Deelopment should be minimal
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-14432) created_by is not exposed in the python wrapper, creating reader side issue.

Reply via email to