[
https://issues.apache.org/jira/browse/ARROW-12609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Mozharov updated ARROW-12609:
------------------------------------
Description:
{{For List-like data types, the scalar type corresponding to a missing value
has '__len__' attribute, but TypeError is raised when it is accessed}}
{code:java}
import pyarrow as pa
data_type = pa.list_(pa.struct([
('a', pa.int64()),
('b', pa.bool_())
]))
data = [[{'a': 1, 'b': False}, {'a': 2, 'b': True}], None]
arr = pa.array(data, type=data_type)
missing_scalar = arr[1] # <pyarrow.ListScalar: None>
assert missing_scalar.type == arr[0].type # Valid scalar and missing scalar
have the same type
assert hasattr(missing_scalar, '__len__')
assert len(missing_scalar) == 0 # --> TypeError: object of type 'NoneType' has
no len()
{code}
Expected behavior: length is expected to be 0.
This issue causes several pandas unit tests to fail when an ExtensionArray
backed by arrow array with this data type is built.
This behavior is also inconsistent with a similar example where the data type
is a struct:
{code:java}
import pyarrow as pa
data_type = pa.struct([
('a', pa.int64()),
('b', pa.bool_())
])
data = [{'a': 1, 'b': False}, None]arr = pa.array(data, type=data_type)
missing_scalar = arr[1] # <pyarrow.StructScalar: None>
assert hasattr(missing_scalar, '__len__')
assert len(missing_scalar) == 0 # Ok
{code}
In this second example the TypeError is not raised.
was:
For List-like data types, the scalar type corresponding to a missing value has
'__len__' attribute, but TypeError is raised when it is accessed
```python
data_type = pa.list_(pa.struct([
('a', pa.int64()),
('b', pa.bool_())
]))
data = [[
{'a': 1, 'b': False}
,
{'a': 2, 'b': True}
], None]
arr = pa.array(data, type=data_type)
missing_scalar = arr[1] # <pyarrow.ListScalar: None>
assert hasattr(missing_scalar, '___len___')
assert len(missing_scalar) == 0 # --> TypeError: object of type 'NoneType'
has no len()
```
Expected behavior: length is expected to be 0.
This issue causes several pandas unit tests to fail when an ExtensionArray
backed by arrow array with this data type is built.
This behavior is also inconsistent with a similar example where the data type
is a struct:
```python
data_type = pa.struct([
('a', pa.int64()),
('b', pa.bool_())
])
data = [
{'a': 1, 'b': False}
, None]
arr = pa.array(data, type=data_type)
missing_scalar = arr[1] # <pyarrow.StructScalar: None>
assert hasattr(missing_scalar, '___len___')
assert len(missing_scalar) == 0 # Ok
```
In this second example the TypeError is not raised.
> TypeError when accessing length of a ListScalar with list-like data type
> ------------------------------------------------------------------------
>
> Key: ARROW-12609
> URL: https://issues.apache.org/jira/browse/ARROW-12609
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 3.0.0, 4.0.0
> Environment: python=3.9.2
> pyarrow=4.0.0 (3.0.0 has the same behavior)
> Reporter: Sergey Mozharov
> Priority: Major
>
> {{For List-like data types, the scalar type corresponding to a missing value
> has '__len__' attribute, but TypeError is raised when it is accessed}}
> {code:java}
> import pyarrow as pa
> data_type = pa.list_(pa.struct([
> ('a', pa.int64()),
> ('b', pa.bool_())
> ]))
> data = [[{'a': 1, 'b': False}, {'a': 2, 'b': True}], None]
> arr = pa.array(data, type=data_type)
> missing_scalar = arr[1] # <pyarrow.ListScalar: None>
> assert missing_scalar.type == arr[0].type # Valid scalar and missing scalar
> have the same type
> assert hasattr(missing_scalar, '__len__')
> assert len(missing_scalar) == 0 # --> TypeError: object of type 'NoneType'
> has no len()
> {code}
> Expected behavior: length is expected to be 0.
> This issue causes several pandas unit tests to fail when an ExtensionArray
> backed by arrow array with this data type is built.
> This behavior is also inconsistent with a similar example where the data type
> is a struct:
>
> {code:java}
> import pyarrow as pa
> data_type = pa.struct([
> ('a', pa.int64()),
> ('b', pa.bool_())
> ])
> data = [{'a': 1, 'b': False}, None]arr = pa.array(data, type=data_type)
> missing_scalar = arr[1] # <pyarrow.StructScalar: None>
> assert hasattr(missing_scalar, '__len__')
> assert len(missing_scalar) == 0 # Ok
> {code}
>
> In this second example the TypeError is not raised.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)