[
https://issues.apache.org/jira/browse/ARROW-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368688#comment-17368688
]
Joris Van den Bossche commented on ARROW-9997:
----------------------------------------------
To restate from my last comment above, regardless of the {{as_py}} discussion,
I think we can certainly already do:
- Fix the issue on the side of pyarrow's {{StructScalar}}, so that you can eg
get a field value by index instead of by name using pyarrow's APIs + fix the
cases where we have wrong behaviour (like the contains example). To ensure that
at least our own StructScalar is usable and properly behaving with duplicate
field names.
That should already resolve one of the issues raised above. It doesn't resolve
the discussion about the return type of {{as_py()}} of course.
The return value of {{as_py}} as a dictionary is IMO the most useful one in a
large majority of the use cases (and I don't think it "should not fail in any
circumstance"). It's a reality that there is not always a direct 1:1 mapping
with built-in python structures.
> [Python] StructScalar.as_py() fails if the type has duplicate field names
> -------------------------------------------------------------------------
>
> Key: ARROW-9997
> URL: https://issues.apache.org/jira/browse/ARROW-9997
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Krisztian Szucs
> Assignee: Krisztian Szucs
> Priority: Major
> Fix For: 5.0.0
>
>
> {{StructScalar}} currently extends an abstract Mapping interface. Since the
> type allows duplicate field names we cannot provide that API.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)