[ 
https://issues.apache.org/jira/browse/ARROW-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Reback updated ARROW-1285:
-------------------------------
    Summary: PYTHON: NotImplemented exception creates empty parquet file  (was: 
NotImplemented exception creates empty parquet file)

> PYTHON: NotImplemented exception creates empty parquet file
> -----------------------------------------------------------
>
>                 Key: ARROW-1285
>                 URL: https://issues.apache.org/jira/browse/ARROW-1285
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.5.0
>            Reporter: Jeff Reback
>            Priority: Minor
>
> This is correctly raising (because categorical is not implemented), but it is 
> creating an empty file.
> xref 
> https://github.com/pandas-dev/pandas/pull/15838#pullrequestreview-52576290
> {code}
> In [2]:    df = pd.DataFrame({'a': list('abc'),
>    ...:                       'b': list(range(1, 4)),
>    ...:                       'c': np.arange(3, 6).astype('u1'),
>    ...:                       'd': np.arange(4.0, 7.0, dtype='float64'),
>    ...:                       'e': [True, False, True],
>    ...:                       'f': pd.Categorical(list('abc')),
>    ...:                       'g': pd.date_range('20130101', periods=3),
>    ...:                       'h': pd.date_range('20130101', periods=3, 
> tz='US/Eastern'),
>    ...:                       'i': pd.date_range('20130101', periods=3, 
> freq='ns')})
>    ...: 
> In [3]: df.to_parquet('foo.pq')
> ---------------------------------------------------------------------------
> ---------------------------------------------------------------------------
> ArrowNotImplementedError                  Traceback (most recent call last)
> <ipython-input-3-8070fb7e3e2c> in <module>()
> ----> 1 df.to_parquet('foo.pq')
> /Users/jreback/pandas/pandas/core/frame.py in to_parquet(self, fname, engine, 
> compression, **kwargs)
>    1620         from pandas.io.parquet import to_parquet
>    1621         to_parquet(self, fname, engine,
> -> 1622                    compression=compression, **kwargs)
>    1623 
>    1624     @Substitution(header='Write out column names. If a list of string 
> is given, \
> /Users/jreback/pandas/pandas/io/parquet.py in to_parquet(df, path, engine, 
> compression, **kwargs)
>     152         raise ValueError("parquet must have string column names")
>     153 
> --> 154     return impl.write(df, path, compression=compression)
>     155 
>     156 
> /Users/jreback/pandas/pandas/io/parquet.py in write(self, df, path, 
> compression, **kwargs)
>      53         table = self.api.Table.from_pandas(df, timestamps_to_ms=True)
>      54         self.api.parquet.write_table(
> ---> 55             table, path, compression=compression, **kwargs)
>      56 
>      57     def read(self, path):
> /Users/jreback/miniconda3/envs/pandas/lib/python3.6/site-packages/pyarrow/parquet.py
>  in write_table(table, where, row_group_size, version, use_dictionary, 
> compression, use_deprecated_int96_timestamps, **kwargs)
>     770         version=version,
>     771         
> use_deprecated_int96_timestamps=use_deprecated_int96_timestamps)
> --> 772     writer = ParquetWriter(where, table.schema, **options)
>     773     writer.write_table(table, row_group_size=row_group_size)
>     774     writer.close()
> _parquet.pyx in pyarrow._parquet.ParquetWriter.__cinit__()
> error.pxi in pyarrow.lib.check_status()
> ArrowNotImplementedError: NotImplemented: unhandled type
> In [4]: !ls -ltr foo.pq
> -rw-r--r--  1 jreback  staff  0 Jul 27 06:03 foo.pq
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to