jorisvandenbossche opened a new issue, #42016:
URL: https://github.com/apache/arrow/issues/42016
Reading and writing data with a float16 field works just fine (because its
implemented on the C++ side):
```python
>>> table = pa.table({"a": np.array([0.1, 0.2], "float16"), "b":
np.array([1, 2], "int8")})
>>> pq.write_table(table, "/tmp/test_float16.parquet")
>>> meta = pq.read_metadata("/tmp/test_float16.parquet")
>>> meta.schema
<pyarrow._parquet.ParquetSchema object at 0x7f488ec55000>
required group field_id=-1 schema {
optional fixed_len_byte_array(2) field_id=-1 a (Float16);
optional int32 field_id=-1 b (Int(bitWidth=8, isSigned=true));
}
```
But in a few parts of the API you can see we didn't add it to the python
bindings:
```python
>>> meta.schema.column(0).logical_type
<pyarrow._parquet.ParquetLogicalType object at 0x7f488ef60210>
Float16
>>> meta.schema.column(1).logical_type
<pyarrow._parquet.ParquetLogicalType object at 0x7f4894c9fcf0>
Int(bitWidth=8, isSigned=true)
>>> meta.schema.column(0).logical_type.type
'UNKNOWN' # <--- UNKNOWN instead of
FLOAT16 here
>>> meta.schema.column(1).logical_type.type
'INT'
```
That comes from
https://github.com/apache/arrow/blob/374b8f6ddec3b7614408ea874ffb29981c2a295d/python/pyarrow/_parquet.pyx#L1290-L1306
(it might actually be the only place to add it)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]