[jira] [Commented] (ARROW-9997) [Python] StructScalar.as_py() fails if the type has duplicate field names

Joris Van den Bossche (Jira) Thu, 22 Oct 2020 01:40:59 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218871#comment-17218871
 ]


Joris Van den Bossche commented on ARROW-9997:
----------------------------------------------

With multidict, you mean something like 
https://multidict.readthedocs.io/en/stable/multidict.html ?

bq. My issue is StructScalar is an arrow object which implements a python 
mapping interface. Once we have duplicate keys the object stops to operate, we 
cannot do anything with it since all operation will raise a KeyError (not just 
when we call .as_py())

But that's something we could fix independenly from {{as_py()}}, I think? 

For example, {{__contains__}} relies on key access, which gives wrong results:

{code}
In [12]: s = pa.array([[('a', 1), ('b', 2), ('a', 3)]], pa.struct([('a', 
'int64'), ('b', 'int64'), ('a', 'int64')]))[0]

In [13]: type(s)
Out[13]: pyarrow.lib.StructScalar

In [15]: list(s)
Out[15]: ['a', 'b', 'a']

In [16]: 'b' in s
Out[16]: True

In [17]: 'a' in s
Out[17]: False
{code}

But that's something we could fix if we want, without changing the default 
{{as_py}} behaviour.

And we could also add a method to get a field value by index instead of by 
name, which you could use if there are duplicate fields. 

> [Python] StructScalar.as_py() fails if the type has duplicate field names
> -------------------------------------------------------------------------
>
>                 Key: ARROW-9997
>                 URL: https://issues.apache.org/jira/browse/ARROW-9997
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Krisztian Szucs
>            Assignee: Krisztian Szucs
>            Priority: Major
>             Fix For: 3.0.0
>
>
> {{StructScalar}} currently extends an abstract Mapping interface. Since the 
> type allows duplicate field names we cannot provide that API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9997) [Python] StructScalar.as_py() fails if the type has duplicate field names

Reply via email to