hi Roman, On Mon, Oct 28, 2019 at 5:56 AM <roman.karlstet...@gmail.com> wrote: > > Hi everyone, > > > > I have a question about the state of decimal support in Arrow when reading > from/writing to Parquet. > > * Is writing decimals to parquet supposed to work? Are there any > examples on how to do this in C++?
Yes, it's supported, the details are here https://github.com/apache/arrow/blob/46cdf557eb710f17f71a10609e5f497ca585ae1c/cpp/src/parquet/column_writer.cc#L1511 > * When reading decimals in a parquet file with pyarrow and converting > the resulting table to a pandas dataframe, datatype in the cells is > "object". As a consequence, performance when doing analysis on this table is > suboptimal. Can I somehow directly get the decimals from the parquet file > into floats/doubles in a pandas dataframe? Some work will be required. The cleanest way would be to cast decimal128 columns to float32/float64 prior to converting to pandas. I didn't see an issue for this right away so I opened https://issues.apache.org/jira/browse/ARROW-7010 I also opened https://issues.apache.org/jira/browse/ARROW-7011 about going the other way. This would be a useful thing to contribute to the project. Thanks Wes > > > Thanks in advance, > > Roman > > >