[ https://issues.apache.org/jira/browse/ARROW-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497104#comment-16497104 ]
Andy Reagan commented on ARROW-2654: ------------------------------------ I upgraded to the latest version of Pandas (0.23.0), which now has pyarrow==0.9.0.post1, and I get basically the same {code:java} Traceback (most recent call last): File "src/data/CLXP_pull.py", line 214, in <module> main() File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py", line 722, in __call__ return self.main(*args, **kwargs) File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py", line 535, in invoke return callback(*args, **kwargs) File "src/data/CLXP_pull.py", line 188, in main results[fullname] = pd.read_parquet(os.path.join(project_dir, "data", "raw", fullname+".parquet"), engine="pyarrow") File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py", line 288, in read_parquet return impl.read(path, columns=columns, **kwargs) File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py", line 131, in read **kwargs).to_pandas() File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py", line 939, in read_table pf = ParquetFile(source, metadata=metadata) File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py", line 64, in __init__ self.reader.open(source, metadata=metadata) File "_parquet.pyx", line 651, in pyarrow._parquet.ParquetReader.open File "error.pxi", line 79, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError: Arrow error: IOError: [Errno 22] Invalid argument {code} > pyarrow cannot load a file it created > ------------------------------------- > > Key: ARROW-2654 > URL: https://issues.apache.org/jira/browse/ARROW-2654 > Project: Apache Arrow > Issue Type: Bug > Reporter: Andy Reagan > Priority: Major > > I saved a file using pandas to_parquet method, but can't read it back in. > Here's the full stack trace: > > {code:java} > Traceback (most recent call last): > File "src/data/CLXP_pull.py", line 214, in <module> > main() > File > "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py", > line 722, in _call_ > return self.main(*args, **kwargs) > File > "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py", > line 697, in main > rv = self.invoke(ctx) > File > "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py", > line 895, in invoke > return ctx.invoke(self.callback, **ctx.params) > File > "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py", > line 535, in invoke > return callback(*args, **kwargs) > File "src/data/CLXP_pull.py", line 188, in main > results[fullname] = pd.read_parquet(os.path.join(project_dir, "data", "raw", > fullname+".parquet"), engine="pyarrow") > File > "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py", > line 257, in read_parquet > return impl.read(path, columns=columns, **kwargs) > File > "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py", > line 130, in read > **kwargs).to_pandas() > File > "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py", > line 939, in read_table > pf = ParquetFile(source, metadata=metadata) > File > "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py", > line 64, in _init_ > self.reader.open(source, metadata=metadata) > File "_parquet.pyx", line 651, in pyarrow._parquet.ParquetReader.open > File "error.pxi", line 79, in pyarrow.lib.check_status > pyarrow.lib.ArrowIOError: Arrow error: IOError: [Errno 22] Invalid argument > {code} > Any ideas what could cause this? The file itself is 3.6GB. > I'm running pandas==0.22.0 and arrow==0.12.1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)