thatlittleboy opened a new issue, #12899:
URL: https://github.com/apache/arrow/issues/12899

   Consider the following example with pandas:
   
   ```python
   [ins] In [11]: df = pd.DataFrame({
             ...:     "cat1": pd.Categorical(["a", "b", "a"]),
             ...:     "cat2": pd.cut(range(1, 10, 3), [-1, 5, 10]),
             ...: })
   
   [ins] In [14]: df['cat2'].cat.categories
   Out[14]: IntervalIndex([(-1, 5], (5, 10]], dtype='interval[int64, right]')
   ```
   
   I have a categorical column `cat2` whose category dtypes are intervals.
   
   I can write the dataframe to a feather file, no issues, but reading it 
throws an ArrowInvalid error:
   
   ```python
   [ins] In [19]: feather.write_feather(df, "test.feather")
   
   [ins] In [20]: feather.read_feather("test.feather")
   ---------------------------------------------------------------------------
   ArrowInvalid                              Traceback (most recent call last)
   Input In [20], in <cell line: 1>()
   ----> 1 feather.read_feather("test.feather")
   
   File ~/Desktop/test/venv/lib/python3.9/site-packages/pyarrow/feather.py:220, 
in read_feather(source, columns, use_threads, memory_map)
       198 """
       199 Read a pandas.DataFrame from Feather format. To read as 
pyarrow.Table use
       200 feather.read_table.
      (...)
       217 df : pandas.DataFrame
       218 """
       219 _check_pandas_version()
   --> 220 return (read_table(
       221     source, columns=columns, memory_map=memory_map,
       222     use_threads=use_threads).to_pandas(use_threads=use_threads))
   
   File ~/Desktop/test/venv/lib/python3.9/site-packages/pyarrow/feather.py:248, 
in read_table(source, columns, memory_map, use_threads)
       244 reader = _feather.FeatherReader(
       245     source, use_memory_map=memory_map, use_threads=use_threads)
       247 if columns is None:
   --> 248     return reader.read()
       250 column_types = [type(column) for column in columns]
       251 if all(map(lambda t: t == int, column_types)):
   
   File 
~/Desktop/test/venv/lib/python3.9/site-packages/pyarrow/_feather.pyx:88, in 
pyarrow._feather.FeatherReader.read()
   
   File ~/Desktop/test/venv/lib/python3.9/site-packages/pyarrow/error.pxi:99, 
in pyarrow.lib.check_status()
   
   ArrowInvalid: Ran out of field metadata, likely malformed
   ```
   
   The error only occurs with the `cat2` (category[interval]) column. For 
normal categorical columns like `cat1` in my example, there are no issues.
   I note that Interval types are supposedly supported 
([here](https://github.com/apache/arrow/blob/master/docs/source/status.rst)), 
so is this a bug or am I misunderstanding anything (and the error is expected)?
   
   
   ## versions
   
   python 3.9.10
   pandas==1.4.2
   pyarrow==7.0.0
   mac OS 12.2
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to