[jira] [Commented] (ARROW-13269) [C++] [Dataset] pyarrow.parquet.write_to_dataset does not send full schema to metadata_collector

Joris Van den Bossche (Jira) Wed, 07 Jul 2021 10:24:24 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376723#comment-17376723
 ]


Joris Van den Bossche commented on ARROW-13269:
-----------------------------------------------

Sidenote: there is also the question whether we should drop partition columns 
at all from the written files for a partitioned dataset. Based on a previous 
conversation on the mailing list, it seems there are other systems that don't 
exclude those columns. At the time I opened an issue to check that we can 
_read_ such datasets (with duplicate information between partitioning and file 
columns) -> ARROW-10347. But we should maybe also consider if we want to be 
able to _write_ such datasets.

> [C++] [Dataset] pyarrow.parquet.write_to_dataset does not send full schema to 
> metadata_collector
> ------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-13269
>                 URL: https://issues.apache.org/jira/browse/ARROW-13269
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 4.0.0
>            Reporter: Weston Pace
>            Priority: Major
>
> If there are partition columns specified then the writers will only write the 
> non-partition columns and thus they will not contain the fields used for the 
> partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13269) [C++] [Dataset] pyarrow.parquet.write_to_dataset does not send full schema to metadata_collector

Reply via email to