[ 
https://issues.apache.org/jira/browse/ARROW-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Reagan updated ARROW-2654:
-------------------------------
    Description: 
I saved a file using pandas to_parquet method, but can't read it back in. 
Here's the full stack trace:

 
{code:java}
Traceback (most recent call last):
File "src/data/CLXP_pull.py", line 214, in <module>
 main()
 File 
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
 line 722, in _call_
 return self.main(*args, **kwargs)
 File 
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
 line 697, in main
 rv = self.invoke(ctx)
 File 
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
 line 895, in invoke
 return ctx.invoke(self.callback, **ctx.params)
 File 
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
 line 535, in invoke
 return callback(*args, **kwargs)
 File "src/data/CLXP_pull.py", line 188, in main
 results[fullname] = pd.read_parquet(os.path.join(project_dir, "data", "raw", 
fullname+".parquet"), engine="pyarrow")
 File 
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py",
 line 257, in read_parquet
 return impl.read(path, columns=columns, **kwargs)
 File 
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py",
 line 130, in read
 **kwargs).to_pandas()
 File 
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py",
 line 939, in read_table
 pf = ParquetFile(source, metadata=metadata)
 File 
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py",
 line 64, in _init_
 self.reader.open(source, metadata=metadata)
 File "_parquet.pyx", line 651, in pyarrow._parquet.ParquetReader.open
 File "error.pxi", line 79, in pyarrow.lib.check_status
 pyarrow.lib.ArrowIOError: Arrow error: IOError: [Errno 22] Invalid argument
{code}
Any ideas what could cause this? The file itself is 3.6GB.

I'm running pandas==0.22.0 and arrow==0.12.1.

  was:
I saved a file using pandas to_parquet method, but can't read it back in. 
Here's the full stack trace:

```

Traceback (most recent call last):

File "src/data/CLXP_pull.py", line 214, in <module>
 main()
 File 
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
 line 722, in __call__
 return self.main(*args, **kwargs)
 File 
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
 line 697, in main
 rv = self.invoke(ctx)
 File 
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
 line 895, in invoke
 return ctx.invoke(self.callback, **ctx.params)
 File 
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
 line 535, in invoke
 return callback(*args, **kwargs)
 File "src/data/CLXP_pull.py", line 188, in main
 results[fullname] = pd.read_parquet(os.path.join(project_dir, "data", "raw", 
fullname+".parquet"), engine="pyarrow")
 File 
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py",
 line 257, in read_parquet
 return impl.read(path, columns=columns, **kwargs)
 File 
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py",
 line 130, in read
 **kwargs).to_pandas()
 File 
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py",
 line 939, in read_table
 pf = ParquetFile(source, metadata=metadata)
 File 
"/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py",
 line 64, in __init__
 self.reader.open(source, metadata=metadata)
 File "_parquet.pyx", line 651, in pyarrow._parquet.ParquetReader.open
 File "error.pxi", line 79, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Arrow error: IOError: [Errno 22] Invalid argument

```

Any ideas what could cause this? The file itself is 3.6GB.

I'm running pandas==0.22.0 and arrow==0.12.1.


> pyarrow cannot load a file it created
> -------------------------------------
>
>                 Key: ARROW-2654
>                 URL: https://issues.apache.org/jira/browse/ARROW-2654
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Andy Reagan
>            Priority: Major
>
> I saved a file using pandas to_parquet method, but can't read it back in. 
> Here's the full stack trace:
>  
> {code:java}
> Traceback (most recent call last):
> File "src/data/CLXP_pull.py", line 214, in <module>
>  main()
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
>  line 722, in _call_
>  return self.main(*args, **kwargs)
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
>  line 697, in main
>  rv = self.invoke(ctx)
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
>  line 895, in invoke
>  return ctx.invoke(self.callback, **ctx.params)
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py",
>  line 535, in invoke
>  return callback(*args, **kwargs)
>  File "src/data/CLXP_pull.py", line 188, in main
>  results[fullname] = pd.read_parquet(os.path.join(project_dir, "data", "raw", 
> fullname+".parquet"), engine="pyarrow")
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py",
>  line 257, in read_parquet
>  return impl.read(path, columns=columns, **kwargs)
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py",
>  line 130, in read
>  **kwargs).to_pandas()
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py",
>  line 939, in read_table
>  pf = ParquetFile(source, metadata=metadata)
>  File 
> "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py",
>  line 64, in _init_
>  self.reader.open(source, metadata=metadata)
>  File "_parquet.pyx", line 651, in pyarrow._parquet.ParquetReader.open
>  File "error.pxi", line 79, in pyarrow.lib.check_status
>  pyarrow.lib.ArrowIOError: Arrow error: IOError: [Errno 22] Invalid argument
> {code}
> Any ideas what could cause this? The file itself is 3.6GB.
> I'm running pandas==0.22.0 and arrow==0.12.1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to