[jira] [Commented] (ARROW-14196) [C++][Parquet] Default to compliant nested types in Parquet writer

Joris Van den Bossche (Jira) Mon, 11 Oct 2021 08:54:04 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427218#comment-17427218
 ]


Joris Van den Bossche commented on ARROW-14196:
-----------------------------------------------

bq. 2.  Make it possible to select columns by eliding the list name components.

Currently, I think the C++ API only deals with column indices? (at least for 
the Python bindings, the translation of column names to field indices happens 
in Python) For Python that should be relatively straightforward to implement. 
Opened ARROW-14286 for this. 

bq. If so, I'd have to support both naming conventions because both would exist 
in the wild.

[~jpivarski] yes, but that's already the case right now as well. Parquet files 
written by (py)arrow will use a different name for the list element compared to 
parquet files written by other tools (that's actually what we are trying to 
harmonize). So if you select a subfield of a list field by name, you already 
need to take into account potentially different names at the moment. 

> [C++][Parquet] Default to compliant nested types in Parquet writer
> ------------------------------------------------------------------
>
>                 Key: ARROW-14196
>                 URL: https://issues.apache.org/jira/browse/ARROW-14196
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Parquet
>            Reporter: Joris Van den Bossche
>            Priority: Major
>             Fix For: 6.0.0
>
>
> In C++ there is already an option to get the "compliant_nested_types" (to 
> have the list columns follow the Parquet specification), and ARROW-11497 
> exposed this option in Python.
> This is still set to False by default, but in the source it says "TODO: At 
> some point we should flip this.", and in ARROW-11497 there was also some 
> discussion about what it would take to change the default.
> cc [~emkornfield] [~apitrou]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-14196) [C++][Parquet] Default to compliant nested types in Parquet writer

Reply via email to