[
https://issues.apache.org/jira/browse/ARROW-12609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338414#comment-17338414
]
Sergey Mozharov commented on ARROW-12609:
-----------------------------------------
Another question to think about is what does the "length" of a ListScalar mean?
If it can defined is as "how many wrapped objects can I get from this list
scalar?", then the validity of the wrapped object is irrelevant. Both an empty
list and null have no wrapped objects in them so the length of both scalars
would be 0.
> TypeError when accessing length of an invalid ListScalar
> --------------------------------------------------------
>
> Key: ARROW-12609
> URL: https://issues.apache.org/jira/browse/ARROW-12609
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 3.0.0, 4.0.0
> Environment: Windows 10
> python=3.9.2
> pyarrow=4.0.0 (3.0.0 has the same behavior)
> Reporter: Sergey Mozharov
> Priority: Major
>
> For List-like data types, the scalar corresponding to a missing value has
> '___len___' attribute, but TypeError is raised when it is accessed
> {code:java}
> import pyarrow as pa
> data_type = pa.list_(pa.struct([
> ('a', pa.int64()),
> ('b', pa.bool_())
> ]))
> data = [[{'a': 1, 'b': False}, {'a': 2, 'b': True}], None]
> arr = pa.array(data, type=data_type)
> missing_scalar = arr[1] # <pyarrow.ListScalar: None>
> assert hasattr(missing_scalar, '__len__')
> assert len(missing_scalar) == 0 # --> TypeError: object of type 'NoneType'
> has no len()
> {code}
> Expected behavior: length is expected to be 0.
> This issue causes several pandas unit tests to fail when an ExtensionArray
> backed by arrow array with this data type is built.
> This behavior is also inconsistent with a similar example where the data type
> is a struct:
> {code:java}
> import pyarrow as pa
> data_type = pa.struct([
> ('a', pa.int64()),
> ('b', pa.bool_())
> ])
> data = [{'a': 1, 'b': False}, None]
> arr = pa.array(data, type=data_type)
> missing_scalar = arr[1] # <pyarrow.StructScalar: None>
> assert hasattr(missing_scalar, '__len__')
> assert len(missing_scalar) == 0 # Ok
> {code}
> In this second example the TypeError is not raised.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)