[
https://issues.apache.org/jira/browse/ARROW-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497104#comment-16497104
]
Andy Reagan commented on ARROW-2654:
------------------------------------
I upgraded to the latest version of Pandas (0.23.0), which now has
pyarrow==0.9.0.post1, and I get basically the same
{code:java}
Traceback (most recent call last): File "src/data/CLXP_pull.py", line 214, in
<module> main() File
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
line 722, in __call__ return self.main(*args, **kwargs) File
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
line 697, in main rv = self.invoke(ctx) File
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
line 895, in invoke return ctx.invoke(self.callback, **ctx.params) File
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
line 535, in invoke return callback(*args, **kwargs) File
"src/data/CLXP_pull.py", line 188, in main results[fullname] =
pd.read_parquet(os.path.join(project_dir, "data", "raw", fullname+".parquet"),
engine="pyarrow") File
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py",
line 288, in read_parquet return impl.read(path, columns=columns, **kwargs)
File
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py",
line 131, in read **kwargs).to_pandas() File
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py",
line 939, in read_table pf = ParquetFile(source, metadata=metadata) File
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py",
line 64, in __init__ self.reader.open(source, metadata=metadata) File
"_parquet.pyx", line 651, in pyarrow._parquet.ParquetReader.open File
"error.pxi", line 79, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError:
Arrow error: IOError: [Errno 22] Invalid argument
{code}
> pyarrow cannot load a file it created
> -------------------------------------
>
> Key: ARROW-2654
> URL: https://issues.apache.org/jira/browse/ARROW-2654
> Project: Apache Arrow
> Issue Type: Bug
> Reporter: Andy Reagan
> Priority: Major
>
> I saved a file using pandas to_parquet method, but can't read it back in.
> Here's the full stack trace:
>
> {code:java}
> Traceback (most recent call last):
> File "src/data/CLXP_pull.py", line 214, in <module>
> main()
> File
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
> line 722, in _call_
> return self.main(*args, **kwargs)
> File
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
> line 697, in main
> rv = self.invoke(ctx)
> File
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
> line 895, in invoke
> return ctx.invoke(self.callback, **ctx.params)
> File
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
> line 535, in invoke
> return callback(*args, **kwargs)
> File "src/data/CLXP_pull.py", line 188, in main
> results[fullname] = pd.read_parquet(os.path.join(project_dir, "data", "raw",
> fullname+".parquet"), engine="pyarrow")
> File
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py",
> line 257, in read_parquet
> return impl.read(path, columns=columns, **kwargs)
> File
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py",
> line 130, in read
> **kwargs).to_pandas()
> File
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py",
> line 939, in read_table
> pf = ParquetFile(source, metadata=metadata)
> File
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py",
> line 64, in _init_
> self.reader.open(source, metadata=metadata)
> File "_parquet.pyx", line 651, in pyarrow._parquet.ParquetReader.open
> File "error.pxi", line 79, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: Arrow error: IOError: [Errno 22] Invalid argument
> {code}
> Any ideas what could cause this? The file itself is 3.6GB.
> I'm running pandas==0.22.0 and arrow==0.12.1.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)