[
https://issues.apache.org/jira/browse/ARROW-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218871#comment-17218871
]
Joris Van den Bossche commented on ARROW-9997:
----------------------------------------------
With multidict, you mean something like
https://multidict.readthedocs.io/en/stable/multidict.html ?
bq. My issue is StructScalar is an arrow object which implements a python
mapping interface. Once we have duplicate keys the object stops to operate, we
cannot do anything with it since all operation will raise a KeyError (not just
when we call .as_py())
But that's something we could fix independenly from {{as_py()}}, I think?
For example, {{__contains__}} relies on key access, which gives wrong results:
{code}
In [12]: s = pa.array([[('a', 1), ('b', 2), ('a', 3)]], pa.struct([('a',
'int64'), ('b', 'int64'), ('a', 'int64')]))[0]
In [13]: type(s)
Out[13]: pyarrow.lib.StructScalar
In [15]: list(s)
Out[15]: ['a', 'b', 'a']
In [16]: 'b' in s
Out[16]: True
In [17]: 'a' in s
Out[17]: False
{code}
But that's something we could fix if we want, without changing the default
{{as_py}} behaviour.
And we could also add a method to get a field value by index instead of by
name, which you could use if there are duplicate fields.
> [Python] StructScalar.as_py() fails if the type has duplicate field names
> -------------------------------------------------------------------------
>
> Key: ARROW-9997
> URL: https://issues.apache.org/jira/browse/ARROW-9997
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Krisztian Szucs
> Assignee: Krisztian Szucs
> Priority: Major
> Fix For: 3.0.0
>
>
> {{StructScalar}} currently extends an abstract Mapping interface. Since the
> type allows duplicate field names we cannot provide that API.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)