[ 
https://issues.apache.org/jira/browse/ARROW-17133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phillip LeBlanc updated ARROW-17133:
------------------------------------
    Summary: [Go][Parquet] PlainFixedLenByteArrayEncoder behaves differently 
from DictFixedLenByteArrayEncoder with null values where schema has Nullable: 
false  (was: pqarrow: PlainFixedLenByteArrayEncoder behaves differently from 
DictFixedLenByteArrayEncoder with null values where schema has Nullable: false)

> [Go][Parquet] PlainFixedLenByteArrayEncoder behaves differently from 
> DictFixedLenByteArrayEncoder with null values where schema has Nullable: false
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-17133
>                 URL: https://issues.apache.org/jira/browse/ARROW-17133
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Go, Parquet
>    Affects Versions: 8.0.0
>            Reporter: Phillip LeBlanc
>            Priority: Minor
>
> I have created a small repro to illustrate this bug: 
> https://gist.github.com/phillipleblanc/5e3e2d0e6914d276cf9fd79e019581de
> When writing a Decimal128 array to a Parquet file the pqarrow package will 
> prefer to use DictFixedLenByteArrayEncoder. If the size of the array goes 
> over some threshold, it will switch to using PlainFixedLenByteArrayEncoder.
> The DictFixedLenByteArrayEncoder tolerates null values in a Decimal128 array 
> with the arrow schema set to Nullable: false, however the 
> PlainFixedLenByteArrayEncoder will not tolerate null values and will panic.
> Having null values in an array marked as non-nullable is an issue in the user 
> code - however, it was surprising that my buggy code was working some times 
> and not working other times. I would expect the PlainFixedLen encoder to 
> handle nulls the same way as the DictFixedLen encoder or for the DictFixedLen 
> encoder to panic.
> An observation is that most other array types handle nulls with the schema 
> marked as non-nullable when writing to Parquet; this was the first instance I 
> found in the pqarrow package where having the Arrow schema marked as Nullable 
> was necessary for Parquet writing arrays with null values. Again, debatable 
> if this is desirable or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to