[ 
https://issues.apache.org/jira/browse/ARROW-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368688#comment-17368688
 ] 

Joris Van den Bossche commented on ARROW-9997:
----------------------------------------------

To restate from my last comment above, regardless of the {{as_py}} discussion, 
I think we can certainly already do:

- Fix the issue on the side of pyarrow's {{StructScalar}}, so that you can eg 
get a field value by index instead of by name using pyarrow's APIs + fix the 
cases where we have wrong behaviour (like the contains example). To ensure that 
at least our own StructScalar is usable and properly behaving with duplicate 
field names.

That should already resolve one of the issues raised above. It doesn't resolve 
the discussion about the return type of {{as_py()}} of course. 

The return value of {{as_py}} as a dictionary is IMO the most useful one in a 
large majority of the use cases (and I don't think it "should not fail in any 
circumstance"). It's a reality that there is not always a direct 1:1 mapping 
with built-in python structures. 

> [Python] StructScalar.as_py() fails if the type has duplicate field names
> -------------------------------------------------------------------------
>
>                 Key: ARROW-9997
>                 URL: https://issues.apache.org/jira/browse/ARROW-9997
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Krisztian Szucs
>            Assignee: Krisztian Szucs
>            Priority: Major
>             Fix For: 5.0.0
>
>
> {{StructScalar}} currently extends an abstract Mapping interface. Since the 
> type allows duplicate field names we cannot provide that API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to