[
https://issues.apache.org/jira/browse/ARROW-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427218#comment-17427218
]
Joris Van den Bossche commented on ARROW-14196:
-----------------------------------------------
bq. 2. Make it possible to select columns by eliding the list name components.
Currently, I think the C++ API only deals with column indices? (at least for
the Python bindings, the translation of column names to field indices happens
in Python) For Python that should be relatively straightforward to implement.
Opened ARROW-14286 for this.
bq. If so, I'd have to support both naming conventions because both would exist
in the wild.
[~jpivarski] yes, but that's already the case right now as well. Parquet files
written by (py)arrow will use a different name for the list element compared to
parquet files written by other tools (that's actually what we are trying to
harmonize). So if you select a subfield of a list field by name, you already
need to take into account potentially different names at the moment.
> [C++][Parquet] Default to compliant nested types in Parquet writer
> ------------------------------------------------------------------
>
> Key: ARROW-14196
> URL: https://issues.apache.org/jira/browse/ARROW-14196
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Parquet
> Reporter: Joris Van den Bossche
> Priority: Major
> Fix For: 6.0.0
>
>
> In C++ there is already an option to get the "compliant_nested_types" (to
> have the list columns follow the Parquet specification), and ARROW-11497
> exposed this option in Python.
> This is still set to False by default, but in the source it says "TODO: At
> some point we should flip this.", and in ARROW-11497 there was also some
> discussion about what it would take to change the default.
> cc [~emkornfield] [~apitrou]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)