[ 
https://issues.apache.org/jira/browse/ARROW-15370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-15370:
------------------------------------------
    Description: 
Nightly integration tests with kartothek are failing, see eg 
https://github.com/ursacomputing/crossbow/runs/4863725914?check_suite_focus=true

This seems something on our side, and a recent failure (the builds only started 
failing today, and I don't see other differences with the last working build 
yesterday)

Update, a reproducer:

{code}
In [4]: df = pd.DataFrame({'a': [1, 2], 'b': [0.1, 0.2]})

In [5]: table = pa.table(df)

In [6]: table.schema.empty_table().to_pandas()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-a03ecffc0af8> in <module>
----> 1 table.schema.empty_table().to_pandas()

~/scipy/repos/arrow/python/pyarrow/array.pxi in 
pyarrow.lib._PandasConvertible.to_pandas()

~/scipy/repos/arrow/python/pyarrow/table.pxi in pyarrow.lib.Table._to_pandas()

~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in 
table_to_blockmanager(options, table, categories, ignore_metadata, types_mapper)
    790 
    791     axes = [columns, index]
--> 792     return BlockManager(blocks, axes)
    793 
    794 

~/miniconda3/envs/arrow-dev/lib/python3.8/site-packages/pandas/core/internals/managers.py
 in __init__(self, blocks, axes, verify_integrity)
    912                         pass
    913 
--> 914             self._verify_integrity()
    915 
    916     def _verify_integrity(self) -> None:

~/miniconda3/envs/arrow-dev/lib/python3.8/site-packages/pandas/core/internals/managers.py
 in _verify_integrity(self)
    919         for block in self.blocks:
    920             if block.shape[1:] != mgr_shape[1:]:
--> 921                 raise construction_error(tot_items, block.shape[1:], 
self.axes)
    922         if len(self.items) != tot_items:
    923             raise AssertionError(

ValueError: Empty data passed with indices specified.
{code}

It happens specifically if the schema still has pandas metadata that indicate a 
range for the index (which we try to recreate, but that doesn't match the 
actual length of the table).

  was:
Nightly integration tests with kartothek are failing, see eg 
https://github.com/ursacomputing/crossbow/runs/4863725914?check_suite_focus=true

This seems something on our side, and a recent failure (the builds only started 
failing today, and I don't see other differences with the last working build 
yesterday)

Update, a reproducer:

{code}
In [4]: df = pd.DataFrame({'a': [1, 2], 'b': [0.1, 0.2]})

In [5]: table = pa.table(df)

In [6]: table.schema.empty_table().to_pandas()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-a03ecffc0af8> in <module>
----> 1 table.schema.empty_table().to_pandas()

~/scipy/repos/arrow/python/pyarrow/array.pxi in 
pyarrow.lib._PandasConvertible.to_pandas()

~/scipy/repos/arrow/python/pyarrow/table.pxi in pyarrow.lib.Table._to_pandas()

~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in 
table_to_blockmanager(options, table, categories, ignore_metadata, types_mapper)
    790 
    791     axes = [columns, index]
--> 792     return BlockManager(blocks, axes)
    793 
    794 

~/miniconda3/envs/arrow-dev/lib/python3.8/site-packages/pandas/core/internals/managers.py
 in __init__(self, blocks, axes, verify_integrity)
    912                         pass
    913 
--> 914             self._verify_integrity()
    915 
    916     def _verify_integrity(self) -> None:

~/miniconda3/envs/arrow-dev/lib/python3.8/site-packages/pandas/core/internals/managers.py
 in _verify_integrity(self)
    919         for block in self.blocks:
    920             if block.shape[1:] != mgr_shape[1:]:
--> 921                 raise construction_error(tot_items, block.shape[1:], 
self.axes)
    922         if len(self.items) != tot_items:
    923             raise AssertionError(

ValueError: Empty data passed with indices specified.
{code}


> [Python] Regression in empty table to_pandas conversion
> -------------------------------------------------------
>
>                 Key: ARROW-15370
>                 URL: https://issues.apache.org/jira/browse/ARROW-15370
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Priority: Blocker
>             Fix For: 7.0.0
>
>
> Nightly integration tests with kartothek are failing, see eg 
> https://github.com/ursacomputing/crossbow/runs/4863725914?check_suite_focus=true
> This seems something on our side, and a recent failure (the builds only 
> started failing today, and I don't see other differences with the last 
> working build yesterday)
> Update, a reproducer:
> {code}
> In [4]: df = pd.DataFrame({'a': [1, 2], 'b': [0.1, 0.2]})
> In [5]: table = pa.table(df)
> In [6]: table.schema.empty_table().to_pandas()
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> <ipython-input-6-a03ecffc0af8> in <module>
> ----> 1 table.schema.empty_table().to_pandas()
> ~/scipy/repos/arrow/python/pyarrow/array.pxi in 
> pyarrow.lib._PandasConvertible.to_pandas()
> ~/scipy/repos/arrow/python/pyarrow/table.pxi in pyarrow.lib.Table._to_pandas()
> ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in 
> table_to_blockmanager(options, table, categories, ignore_metadata, 
> types_mapper)
>     790 
>     791     axes = [columns, index]
> --> 792     return BlockManager(blocks, axes)
>     793 
>     794 
> ~/miniconda3/envs/arrow-dev/lib/python3.8/site-packages/pandas/core/internals/managers.py
>  in __init__(self, blocks, axes, verify_integrity)
>     912                         pass
>     913 
> --> 914             self._verify_integrity()
>     915 
>     916     def _verify_integrity(self) -> None:
> ~/miniconda3/envs/arrow-dev/lib/python3.8/site-packages/pandas/core/internals/managers.py
>  in _verify_integrity(self)
>     919         for block in self.blocks:
>     920             if block.shape[1:] != mgr_shape[1:]:
> --> 921                 raise construction_error(tot_items, block.shape[1:], 
> self.axes)
>     922         if len(self.items) != tot_items:
>     923             raise AssertionError(
> ValueError: Empty data passed with indices specified.
> {code}
> It happens specifically if the schema still has pandas metadata that indicate 
> a range for the index (which we try to recreate, but that doesn't match the 
> actual length of the table).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to