[
https://issues.apache.org/jira/browse/SPARK-57184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Max Gekk resolved SPARK-57184.
------------------------------
Fix Version/s: 4.3.0
Resolution: Fixed
Issue resolved by pull request 56235
[https://github.com/apache/spark/pull/56235]
> Null struct corrupts nested CalendarInterval column values
> ----------------------------------------------------------
>
> Key: SPARK-57184
> URL: https://issues.apache.org/jira/browse/SPARK-57184
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Assignee: Max Gekk
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.3.0
>
>
> SPARK-56981 / the nanosecond-timestamp column-vector work surfaced a latent
> bug in
> WritableColumnVector.appendStruct(boolean isNull).
> When a struct column is appended as NULL via appendStruct(true), the method
> recurses
> into child columns that are themselves struct-shaped (StructType,
> VariantType) so that
> their grandchild cursors stay aligned. A CalendarInterval child column is also
> struct-shaped: it is backed by three grandchild primitive columns (months as
> int,
> days as int, microseconds as long). However, the recursion guard did not
> include
> CalendarIntervalType, so an interval child took the else branch
> (c.appendNull()),
> which advances only the interval column's own cursor and leaves its three
> grandchild
> columns un-advanced.
> As a result, for a struct column with a CalendarInterval field, appending a
> NULL parent
> row leaves the interval's grandchild cursors behind by one. A subsequent
> non-null row
> then writes its months/days/microseconds into the wrong (earlier) grandchild
> slots, and
> reading that row back returns a skewed/garbage interval value. This is silent
> data
> corruption for the nested struct-of-interval case.
> Fix: include CalendarIntervalType in the recursion guard in appendStruct so
> that a null
> parent struct cascades appendStruct(true) into the interval child, advancing
> all three
> grandchild cursors.
> This was split out of the nanosecond-timestamp ColumnVector PR (SPARK-57100)
> per review,
> since it is an independent fix.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]