[
https://issues.apache.org/jira/browse/ARROW-12609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338409#comment-17338409
]
Sergey Mozharov commented on ARROW-12609:
-----------------------------------------
To me it makes sense to wrap null into a ListScalar in this case. One benefit
of doing it is that it makes the interface simpler: the array type defines the
type of its scalar that is returned during iteration, and there is a single
type instead of a union of valid type and invalid type. This is consistent with
the behavior of primitive arrays, which have one scalar type instead of
Union[Int32Scalar, NullScalar].
I would expect the behavior of ListScalar<null> to be identical to the behavior
of ListScalar<[]> (wrapping an empty list) with the only difference that the
first scalar is invalid (scalar.is_valid -> False) while the second one is
valid. I understand that there may be many other nuances I am not aware of. I
am very curious to know what other alternatives can be suggested.
> TypeError when accessing length of an invalid ListScalar
> --------------------------------------------------------
>
> Key: ARROW-12609
> URL: https://issues.apache.org/jira/browse/ARROW-12609
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 3.0.0, 4.0.0
> Environment: Windows 10
> python=3.9.2
> pyarrow=4.0.0 (3.0.0 has the same behavior)
> Reporter: Sergey Mozharov
> Priority: Major
>
> For List-like data types, the scalar corresponding to a missing value has
> '___len___' attribute, but TypeError is raised when it is accessed
> {code:java}
> import pyarrow as pa
> data_type = pa.list_(pa.struct([
> ('a', pa.int64()),
> ('b', pa.bool_())
> ]))
> data = [[{'a': 1, 'b': False}, {'a': 2, 'b': True}], None]
> arr = pa.array(data, type=data_type)
> missing_scalar = arr[1] # <pyarrow.ListScalar: None>
> assert hasattr(missing_scalar, '__len__')
> assert len(missing_scalar) == 0 # --> TypeError: object of type 'NoneType'
> has no len()
> {code}
> Expected behavior: length is expected to be 0.
> This issue causes several pandas unit tests to fail when an ExtensionArray
> backed by arrow array with this data type is built.
> This behavior is also inconsistent with a similar example where the data type
> is a struct:
> {code:java}
> import pyarrow as pa
> data_type = pa.struct([
> ('a', pa.int64()),
> ('b', pa.bool_())
> ])
> data = [{'a': 1, 'b': False}, None]
> arr = pa.array(data, type=data_type)
> missing_scalar = arr[1] # <pyarrow.StructScalar: None>
> assert hasattr(missing_scalar, '__len__')
> assert len(missing_scalar) == 0 # Ok
> {code}
> In this second example the TypeError is not raised.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)