hi Roman,

On Mon, Oct 28, 2019 at 5:56 AM <roman.karlstet...@gmail.com> wrote:
>
> Hi everyone,
>
>
>
> I have a question about the state of decimal support in Arrow when reading
> from/writing to Parquet.
>
> *       Is writing decimals to parquet supposed to work? Are there any
> examples on how to do this in C++?

Yes, it's supported, the details are here

https://github.com/apache/arrow/blob/46cdf557eb710f17f71a10609e5f497ca585ae1c/cpp/src/parquet/column_writer.cc#L1511

> *       When reading decimals in a parquet file with pyarrow and converting
> the resulting table to a pandas dataframe, datatype in the cells is
> "object". As a consequence, performance when doing analysis on this table is
> suboptimal. Can I somehow directly get the decimals from the parquet file
> into floats/doubles in a pandas dataframe?

Some work will be required. The cleanest way would be to cast
decimal128 columns to float32/float64 prior to converting to pandas.

I didn't see an issue for this right away so I opened

https://issues.apache.org/jira/browse/ARROW-7010

I also opened

https://issues.apache.org/jira/browse/ARROW-7011

about going the other way. This would be a useful thing to contribute
to the project.

Thanks
Wes

>
>
> Thanks in advance,
>
> Roman
>
>
>

Reply via email to