[
https://issues.apache.org/jira/browse/SPARK-48019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun closed SPARK-48019.
---------------------------------
> ColumnVectors with dictionaries and nulls are not read/copied correctly
> -----------------------------------------------------------------------
>
> Key: SPARK-48019
> URL: https://issues.apache.org/jira/browse/SPARK-48019
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.4.3
> Reporter: Gene Pang
> Assignee: Gene Pang
> Priority: Major
> Labels: correctness, pull-request-available
> Fix For: 3.5.2, 3.4.4, 4.0.0
>
>
> {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those
> return a primitive array with the contents of the vector. When the
> ColumnVector has a dictionary, the values are decoded with the dictionary
> before filling in the primitive array.
> However, {{ColumnVectors}} can have nulls, and for those {{null}} entries,
> the dictionary id is irrelevant, and can also be invalid. The dictionary
> should not be used for the {{null}} entries of the vector. Sometimes, this
> can cause an {{ArrayIndexOutOfBoundsException}} .
> In addition to the possible Exception, copying a {{ColumnarArray}} is not
> correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain
> {{null}} values. However, the {{copy()}} for primitive types does not take
> into account the null-ness of the entries, and blindly copies all the
> primitive values. That means the null entries get lost.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]