jorisvandenbossche commented on pull request #7545:
URL: https://github.com/apache/arrow/pull/7545#issuecomment-649631989
Still some work:
Need to add tests for the different filesystems that can be passed.
There are still some skipped tests:
* `ARROW:schema` is not yet removed from the metadata -> ARROW-9009
* Partition fields as dictionary keys
* Specifying `metadata` object (not very important IMO)
One of the `large_memory` tests is also failing
(`test_binary_array_overflow_to_chunked`):
```
$ pytest python/pyarrow/tests/test_parquet.py -v -r s -m large_memory
--enable-large_memory
===============================================================================================
test session starts
===============================================================================================
platform linux -- Python 3.7.3, pytest-5.2.1, py-1.8.0, pluggy-0.12.0 --
/home/joris/miniconda3/envs/arrow-dev/bin/python
cachedir: .pytest_cache
hypothesis profile 'dev' -> max_examples=10,
database=DirectoryBasedExampleDatabase('/home/joris/scipy/repos/arrow/.hypothesis/examples')
rootdir: /home/joris/scipy/repos/arrow/python, inifile: setup.cfg
plugins: hypothesis-4.47.5, lazy-fixture-0.6.1
collected 277 items / 273 deselected / 4 selected
python/pyarrow/tests/test_parquet.py::test_large_table_int32_overflow PASSED
[ 25%]
python/pyarrow/tests/test_parquet.py::test_byte_array_exactly_2gb PASSED
[ 50%]
python/pyarrow/tests/test_parquet.py::test_binary_array_overflow_to_chunked
FAILED
[ 75%]
python/pyarrow/tests/test_parquet.py::test_list_of_binary_large_cell PASSED
[100%]
====================================================================================================
FAILURES
=====================================================================================================
______________________________________________________________________________________
test_binary_array_overflow_to_chunked
______________________________________________________________________________________
assert t.equals(result)
@pytest.mark.pandas
@pytest.mark.large_memory
def test_binary_array_overflow_to_chunked():
# ARROW-3762
# 2^31 + 1 bytes
values = [b'x'] + [
b'x' * (1 << 20)
] * 2 * (1 << 10)
> df = pd.DataFrame({'byte_col': values})
python/pyarrow/tests/test_parquet.py:3043:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
python/pyarrow/tests/test_parquet.py:3010: in _simple_table_roundtrip
stream = pa.BufferOutputStream()
python/pyarrow/tests/test_parquet.py:82: in _read_table
return pq.read_table(*args, **kwargs)
python/pyarrow/parquet.py:1555: in read_table
raise ValueError(
python/pyarrow/parquet.py:1468: in read
use_threads=use_threads
pyarrow/_dataset.pyx:403: in pyarrow._dataset.Dataset.to_table
???
pyarrow/_dataset.pyx:1893: in pyarrow._dataset.Scanner.to_table
???
pyarrow/error.pxi:122: in pyarrow.lib.pyarrow_internal_check_status
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E pyarrow.lib.ArrowNotImplementedError: This class cannot yet iterate
chunked arrays
pyarrow/error.pxi:105: ArrowNotImplementedError
============================================================================= 1
failed, 3 passed, 273 deselected in 512.87s (0:08:32)
=============================================================================
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]