[
https://issues.apache.org/jira/browse/ARROW-17133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Phillip LeBlanc updated ARROW-17133:
------------------------------------
Summary: [Go][Parquet] PlainFixedLenByteArrayEncoder behaves differently
from DictFixedLenByteArrayEncoder with null values where schema has Nullable:
false (was: pqarrow: PlainFixedLenByteArrayEncoder behaves differently from
DictFixedLenByteArrayEncoder with null values where schema has Nullable: false)
> [Go][Parquet] PlainFixedLenByteArrayEncoder behaves differently from
> DictFixedLenByteArrayEncoder with null values where schema has Nullable: false
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-17133
> URL: https://issues.apache.org/jira/browse/ARROW-17133
> Project: Apache Arrow
> Issue Type: Bug
> Components: Go, Parquet
> Affects Versions: 8.0.0
> Reporter: Phillip LeBlanc
> Priority: Minor
>
> I have created a small repro to illustrate this bug:
> https://gist.github.com/phillipleblanc/5e3e2d0e6914d276cf9fd79e019581de
> When writing a Decimal128 array to a Parquet file the pqarrow package will
> prefer to use DictFixedLenByteArrayEncoder. If the size of the array goes
> over some threshold, it will switch to using PlainFixedLenByteArrayEncoder.
> The DictFixedLenByteArrayEncoder tolerates null values in a Decimal128 array
> with the arrow schema set to Nullable: false, however the
> PlainFixedLenByteArrayEncoder will not tolerate null values and will panic.
> Having null values in an array marked as non-nullable is an issue in the user
> code - however, it was surprising that my buggy code was working some times
> and not working other times. I would expect the PlainFixedLen encoder to
> handle nulls the same way as the DictFixedLen encoder or for the DictFixedLen
> encoder to panic.
> An observation is that most other array types handle nulls with the schema
> marked as non-nullable when writing to Parquet; this was the first instance I
> found in the pqarrow package where having the Arrow schema marked as Nullable
> was necessary for Parquet writing arrays with null values. Again, debatable
> if this is desirable or not.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)