Jorge Leitão created ARROW-12201:
------------------------------------
Summary: [C++] [Parquet] Writing uint32 does not preserve
parquet's LogicalType
Key: ARROW-12201
URL: https://issues.apache.org/jira/browse/ARROW-12201
Project: Apache Arrow
Issue Type: Bug
Components: Parquet, C++
Affects Versions: 3.0.0
Reporter: Jorge Leitão
When writing a `uint32` column, (parquet's) logical type is not written,
limiting interoperability with other engines.
Minimal Python
```
import pyarrow as pa
data = {"uint32", [1, None, 0]}
schema = pa.schema([pa.field('uint32', pa.uint32())])
t = pa.table(data, schema=schema)
pa.parquet.write_table(t, "bla.parquet")
```
Inspecting it with spark:
```
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.read.parquet("bla.parquet")
print(df.select("uint32").schema)
```
shows `StructType(List(StructField(uint32,LongType,true)))`. "LongType"
indicates that the field integer was interpreted as a 64 bit integer. Further
inspection would determine that both convertedType and logicalType are not
being set. Note that this is independent of arrow-specific schema written in
the metadata.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)