[
https://issues.apache.org/jira/browse/ARROW-12733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342470#comment-17342470
]
Torstein Sørnes edited comment on ARROW-12733 at 5/11/21, 10:21 AM:
--------------------------------------------------------------------
[~jorisvandenbossche] [~apitrou]
The file is compressed with lz4. It is a Pandas dataframe. The code for writing
it, is:
{code:java}
df.to_feather(path, compression='lz4'){code}
where df is a pandas dataframe.
The file was written with the same version of pyarrow and pandas, as it is
trying to being read.
I have written, and read, successfully, hundreds of pandas dataframe arrow
files using exactly the same code, and library versions. I have no idea why
this in particular, fails.
The file is too big to upload here. Does this link work?
[https://ml-pull.s3.eu-central-1.amazonaws.com/Jobs/glassdoor/jobs1.feather.011]
Cheers, and thanks for you work.
was (Author: tsoernes):
[~jorisvandenbossche] [~apitrou]
The file is compressed with lz4. It is a Pandas dataframe. The code for writing
it, is:
{code:java}
df.to_feather(path, compression='lz4'){code}
where df is a pandas dataframe.
The file was written with the same version of pyarrow and pandas, as it is
trying to being read.
I have written, and read, successfully, hundreds of pandas dataframe arrow
files using exactly the same code, and library versions. I have no idea why
this in particular, fails!?
The file is too big to upload here. Does this link work?
https://ml-pull.s3.eu-central-1.amazonaws.com/Jobs/glassdoor/jobs1.feather.011
> OSError: Invalid IPC stream: negative continuation token
> --------------------------------------------------------
>
> Key: ARROW-12733
> URL: https://issues.apache.org/jira/browse/ARROW-12733
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Torstein Sørnes
> Priority: Major
>
> pyarrow 4.0.0
>
> {code:java}
>
> File
> "/home/ftdb/anaconda3/lib/python3.8/site-packages/pandas/io/feather_format.py",
> line 127, in read_feather
> 84 def read_feather(
> 85 path, columns=None, use_threads: bool = True, storage_options:
> StorageOptions = None
> 86 ):
> (...)
> 123 with get_handle(
> 124 path, "rb", storage_options=storage_options, is_text=False
> 125 ) as handles:
> 126
> --> 127 return feather.read_feather(
> 128 handles.handle, columns=columns, use_threads=bool(use_threads)
> ..................................................
> path = PosixPath('/home/ftdb/data/jobs1.feather.011')
> columns = None
> use_threads = True
> storage_options = None
> StorageOptions = typing.Union[typing.Dict[str, typing.Any], NoneType]
> handles = IOHandles(handle=<_io.BufferedReader name='/home/ftdb/data/j
> obs1.feather.011'>, compression={'method': None}, created_ha
> ndles=[], is_wrapped=False, is_mmap=False)
> feather.read_feather = <function 'read_feather' feather.py:195>
> handles.handle = <_io.BufferedReader name='/home/ftdb/data/jobs1.feather.011'
> >
> ..................................................
> File "/home/ftdb/anaconda3/lib/python3.8/site-packages/pyarrow/feather.py",
> line 216, in read_feather
> 195 def read_feather(source, columns=None, use_threads=True,
> memory_map=True):
> (...)
> 212 -------
> 213 df : pandas.DataFrame
> 214 """
> 215 _check_pandas_version()
> --> 216 return (read_table(source, columns=columns, memory_map=memory_map)
> 217 .to_pandas(use_threads=use_threads))
> ..................................................
> source = <_io.BufferedReader name='/home/ftdb/data/jobs1.feather.011'
> >
> columns = None
> use_threads = True
> memory_map = True
> ..................................................
> File "/home/ftdb/anaconda3/lib/python3.8/site-packages/pyarrow/feather.py",
> line 241, in read_table
> 220 def read_table(source, columns=None, memory_map=True):
> (...)
> 237 reader = ext.FeatherReader()
> 238 reader.open(source, use_memory_map=memory_map)
> 239
> 240 if columns is None:
> --> 241 return reader.read()
> 242
> ..................................................
> source = <_io.BufferedReader name='/home/ftdb/data/jobs1.feather.011'
> >
> columns = None
> memory_map = True
> reader = <pyarrow.lib.FeatherReader object at 0x7f5f0afc2180>
> ext.FeatherReader = <class 'pyarrow.lib.FeatherReader'>
> ..................................................
> File "pyarrow/feather.pxi", line 76, in pyarrow.lib.FeatherReader.read
> File "pyarrow/error.pxi", line 112, in pyarrow.lib.check_status
> ---- (full traceback above) ----
> File
> "/home/ftdb/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py",
> line 3437, in run_code
> exec(code_obj, self.user_global_ns, self.user_ns)
> File "<ipython-input-4-a3c2240634fd>", line 1, in <module>
> df = read_feather(p)
> File "/storage/code/fintechdb/Ftools/ftools/functoolz.py", line 22, in inner
> return func(*args, **kwargs)
> File "/storage/code/fintechdb/Ftools/ftools/pathtools.py", line 51, in inner
> return func(**new_kwargs)
> File "/storage/code/fintechdb/Ftools/ftools/io.py", line 506, in read_feather
> data = pd.read_feather(path, columns, use_threads=True)
> File
> "/home/ftdb/anaconda3/lib/python3.8/site-packages/pandas/io/feather_format.py",
> line 127, in read_feather
> return feather.read_feather(
> File "/home/ftdb/anaconda3/lib/python3.8/site-packages/pyarrow/feather.py",
> line 216, in read_feather
> return (read_table(source, columns=columns, memory_map=memory_map)
> File "/home/ftdb/anaconda3/lib/python3.8/site-packages/pyarrow/feather.py",
> line 241, in read_table
> return reader.read()
> File "pyarrow/feather.pxi", line 76, in pyarrow.lib.FeatherReader.read
> File "pyarrow/error.pxi", line 112, in pyarrow.lib.check_status
> OSError: Invalid IPC stream: negative continuation token
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)