[ 
https://issues.apache.org/jira/browse/SPARK-57184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-57184.
------------------------------
    Fix Version/s: 4.3.0
       Resolution: Fixed

Issue resolved by pull request 56235
[https://github.com/apache/spark/pull/56235]

> Null struct corrupts nested CalendarInterval column values
> ----------------------------------------------------------
>
>                 Key: SPARK-57184
>                 URL: https://issues.apache.org/jira/browse/SPARK-57184
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Assignee: Max Gekk
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.3.0
>
>
> SPARK-56981 / the nanosecond-timestamp column-vector work surfaced a latent 
> bug in
> WritableColumnVector.appendStruct(boolean isNull).
> When a struct column is appended as NULL via appendStruct(true), the method 
> recurses
> into child columns that are themselves struct-shaped (StructType, 
> VariantType) so that
> their grandchild cursors stay aligned. A CalendarInterval child column is also
> struct-shaped: it is backed by three grandchild primitive columns (months as 
> int,
> days as int, microseconds as long). However, the recursion guard did not 
> include
> CalendarIntervalType, so an interval child took the else branch 
> (c.appendNull()),
> which advances only the interval column's own cursor and leaves its three 
> grandchild
> columns un-advanced.
> As a result, for a struct column with a CalendarInterval field, appending a 
> NULL parent
> row leaves the interval's grandchild cursors behind by one. A subsequent 
> non-null row
> then writes its months/days/microseconds into the wrong (earlier) grandchild 
> slots, and
> reading that row back returns a skewed/garbage interval value. This is silent 
> data
> corruption for the nested struct-of-interval case.
> Fix: include CalendarIntervalType in the recursion guard in appendStruct so 
> that a null
> parent struct cascades appendStruct(true) into the interval child, advancing 
> all three
> grandchild cursors.
> This was split out of the nanosecond-timestamp ColumnVector PR (SPARK-57100) 
> per review,
> since it is an independent fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to