[
https://issues.apache.org/jira/browse/ARROW-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695179#comment-16695179
]
Justin Lewis commented on ARROW-3843:
-------------------------------------
I was able to get a dev environment on osx up and running after a few tries on
linux (running into conda abi issues I think).
I'm not too familiar with the code. The call to close() on this line is where
I was starting to debug.
[https://github.com/apache/arrow/blob/c04a62b9579d420da32ed6d962d92266508e6abe/python/pyarrow/_parquet.pyx#L916]
I believe the FileWriter is implemented here
[https://github.com/apache/arrow/blob/c04a62b9579d420da32ed6d962d92266508e6abe/cpp/src/parquet/arrow/writer.cc#L938]
For now I'm trying to understand if the writer is ok with writing an empty file
or not. I'm assuming the problem is the writer expects to write something and
when nothing gets written it gets tripped up.
I'm adding this both as a note to myself but also in case somebody can point me
in the direction of success.
> [Python] Writing Parquet file from empty table created with
> Table.from_pandas(..., preserve_index=False) fails
> --------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-3843
> URL: https://issues.apache.org/jira/browse/ARROW-3843
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Environment: conda list --explicit
> # This file may be used to create an environment using:
> # $ conda create --name <env> --file <this file>
> # platform: linux-64
> @EXPLICIT
> https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2018.10.15-ha4d7672_0.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-7.2.0-hdf63c60_3.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/libgfortran-3.0.0-1.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-ng-7.2.0-hdf63c60_3.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/bzip2-1.0.6-h470a237_2.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/icu-58.2-hfc679d8_0.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/libffi-3.2.1-hfc679d8_5.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/libiconv-1.15-h470a237_3.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/ncurses-6.1-hfc679d8_1.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/openblas-0.3.3-ha44fe06_1.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/openssl-1.0.2p-h470a237_1.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/xz-5.2.4-h470a237_1.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/yaml-0.1.7-h470a237_1.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/zlib-1.2.11-h470a237_3.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/blas-1.1-openblas.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/boost-cpp-1.68.0-h3a22d5f_0.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/libedit-3.1.20170329-haf1bffa_1.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/readline-7.0-haf1bffa_1.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.9-ha92aebf_0.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/sqlite-3.25.3-hb1c47c0_0.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/unixodbc-2.3.7-h09ba92c_0.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/python-3.6.6-h5001a0f_3.tar.bz2
> https://conda.anaconda.org/conda-forge/noarch/atomicwrites-1.2.1-py_0.tar.bz2
> https://conda.anaconda.org/conda-forge/noarch/attrs-18.2.0-py_0.tar.bz2
> https://conda.anaconda.org/conda-forge/noarch/backcall-0.1.0-py_0.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/certifi-2018.10.15-py36_1000.tar.bz2
> https://conda.anaconda.org/conda-forge/noarch/click-7.0-py_0.tar.bz2
> https://conda.anaconda.org/conda-forge/noarch/decorator-4.3.0-py_0.tar.bz2
> https://conda.anaconda.org/conda-forge/noarch/ipython_genutils-0.2.0-py_1.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/numpy-1.15.4-py36_blas_openblashb06ca3d_0.tar.bz2
> https://conda.anaconda.org/conda-forge/noarch/parso-0.3.1-py_0.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/pickleshare-0.7.5-py36_1000.tar.bz2
> https://conda.anaconda.org/conda-forge/noarch/pluggy-0.8.0-py_0.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/ptyprocess-0.6.0-py36_1000.tar.bz2
> https://conda.anaconda.org/conda-forge/noarch/py-1.7.0-py_0.tar.bz2
> https://conda.anaconda.org/conda-forge/noarch/pytz-2018.7-py_0.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/pyyaml-3.13-py36h470a237_1.tar.bz2
> https://conda.anaconda.org/conda-forge/noarch/wcwidth-0.1.7-py_1.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/arrow-cpp-0.11.1-py36h3bd774a_0.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/jedi-0.13.1-py36_1000.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/more-itertools-4.3.0-py36_1000.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/pexpect-4.6.0-py36_1000.tar.bz2
> https://conda.anaconda.org/conda-forge/noarch/python-dateutil-2.7.5-py_0.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/setuptools-40.6.2-py36_0.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/traitlets-4.3.2-py36_1000.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/pandas-0.23.4-py36hf8a1672_0.tar.bz2
> https://conda.anaconda.org/conda-forge/noarch/parquet-cpp-1.5.1-2.tar.bz2
> https://conda.anaconda.org/conda-forge/noarch/pygments-2.2.0-py_1.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/pytest-4.0.0-py36_1000.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/wheel-0.32.3-py36_0.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/pip-18.1-py36_1000.tar.bz2
> https://conda.anaconda.org/conda-forge/noarch/prompt_toolkit-2.0.7-py_0.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/pyarrow-0.11.1-py36hfc679d8_0.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/ipython-7.1.1-py36h24bf2e0_1000.tar.bz2
> https://conda.anaconda.org/conda-forge/linux-64/turbodbc-3.0.0-py36h38e7a2c_0.tar.bz2
> Reporter: Justin Lewis
> Priority: Minor
> Labels: parquet
> Fix For: 0.13.0
>
>
> {code:java}
> import pandas as pd
> import pyarrow.parquet as pq
> import pyarrow as pa
> def test_write_empty_preserve_index():
> # passes
> df = pd.DataFrame()
> table = pa.Table.from_pandas(df, preserve_index=True)
> pq.write_table(table, 'test1.parquet')
> table2 = pq.read_table('test1.parquet')
> df2 = table2.to_pandas()
> pd.util.testing.assert_frame_equal(df, df2)
> def test_write_empty_no_preserve_index():
> df = pd.DataFrame()
> table = pa.Table.from_pandas(df, preserve_index=False)
> # fails here
> pq.write_table(table, 'test2.parquet')
> table2 = pq.read_table('test2.parquet')
> df2 = table2.to_pandas()
> pd.util.testing.assert_frame_equal(df, df2){code}
>
> First test passes. Second one fails with this:
>
> {code:java}
> ___________________________________ test_write_empty_no_preserve_index
> ___________________________________
> def test_write_empty_no_preserve_index():
> df = pd.DataFrame()
> table = pa.Table.from_pandas(df, preserve_index=False)
> # fails here
> > pq.write_table(table, 'test2.parquet')
> test_empty.py:24:
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ../.conda/envs/pedlenv/lib/python3.6/site-packages/pyarrow/parquet.py:1125:
> in write_table
> writer.write_table(table, row_group_size=row_group_size)
> ../.conda/envs/pedlenv/lib/python3.6/site-packages/pyarrow/parquet.py:361: in
> __exit__
> self.close()
> ../.conda/envs/pedlenv/lib/python3.6/site-packages/pyarrow/parquet.py:380: in
> close
> self.writer.close()
> pyarrow/_parquet.pyx:916: in pyarrow._parquet.ParquetWriter.close
> ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _ _ _ _ _ _ _ _ _ _ _
> > ???
> E pyarrow.lib.ArrowIOError: Root node did not have children
> pyarrow/error.pxi:83: ArrowIOError
> {code}
>
> I haven't had a chance to investigate but seems not desired behavior.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)