tustvold commented on issue #3017:
URL: https://github.com/apache/arrow-rs/issues/3017#issuecomment-1304746170
I've narrowed this down to pyarrow not being able to read the converted type
correctly.
```
>>> import pyarrow.parquet as pq
>>> pq.ParquetFile('tmp.par').schema.column(0).converted_type
'NONE'
```
However, fastparquet is able to read the converted type, as it is correctly
encoded in the thrift definition
```
>>> ParquetFile('tmp.par').schema.schema_element('col1').converted_type
10
```
I also tried
https://github.com/xitongsys/parquet-go/tree/master/tool/parquet-tools, which
resulted in
```
./parquet-tools -cmd schema -file
/home/raphael/repos/external/arrow-rs/parquet/tmp.par
{
"Tag": "name=Schema, repetitiontype=REQUIRED",
"Fields": [
{
"Tag": "name=Col1, type=INT64, convertedtype=TIMESTAMP_MICROS,
repetitiontype=REQUIRED"
}
]
}
```
This leads me to think that this is actually a bug in the pyarrow, and
therefore the C++ arrow implementation. Perhaps you might like to raise a bug
there?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]