Jorge Leitão created ARROW-12201:
------------------------------------

             Summary: [C++] [Parquet] Writing uint32 does not preserve 
parquet's LogicalType
                 Key: ARROW-12201
                 URL: https://issues.apache.org/jira/browse/ARROW-12201
             Project: Apache Arrow
          Issue Type: Bug
          Components: Parquet, C++
    Affects Versions: 3.0.0
            Reporter: Jorge Leitão


When writing a `uint32` column, (parquet's) logical type is not written, 
limiting interoperability with other engines.

Minimal Python

```
import pyarrow as pa

data = {"uint32", [1, None, 0]}
schema = pa.schema([pa.field('uint32', pa.uint32())])

t = pa.table(data, schema=schema)
pa.parquet.write_table(t, "bla.parquet")
```
 
Inspecting it with spark:

```
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

df = spark.read.parquet("bla.parquet")
print(df.select("uint32").schema)
```

shows `StructType(List(StructField(uint32,LongType,true)))`. "LongType" 
indicates that the field integer was interpreted as a 64 bit integer. Further 
inspection would determine that both convertedType and logicalType are not 
being set. Note that this is independent of arrow-specific schema written in 
the metadata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to