[ 
https://issues.apache.org/jira/browse/SPARK-57416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089004#comment-18089004
 ] 

Max Gekk commented on SPARK-57416:
----------------------------------

[~yadavay] Go ahead.

> Types Framework - Resolve TimeType Parquet read-path guard ON/OFF divergence 
> for TIME(MICROS, isAdjustedToUTC=true)
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-57416
>                 URL: https://issues.apache.org/jira/browse/SPARK-57416
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Priority: Major
>
> *Parent:* SPARK-55444 (Types Framework), follow-up of PR apache/spark#55326 
> (Phase 3a - Storage Formats - Parquet).
> h3. Problem
> When the Types Framework integrates TimeType into the Parquet read path,
> TimeTypeParquetOps.requireCompatibleParquetType enforces a stricter guard than
> the inline guard it replaces in ParquetRowConverter:
> * New guard (framework ON): INT64 && unit == MICROS && !isAdjustedToUTC
> * Original guard (framework OFF): TimeLogicalTypeAnnotation && unit == MICROS
>   (does NOT check isAdjustedToUTC)
> The new guard is strictly tighter, even though the code comment claims it
> "mirrors the inline guard that existed in ParquetRowConverter before the
> framework dispatch".
> h3. Observed behavior difference
> Reading a TIME(MICROS, isAdjustedToUTC=true) column as TimeType via an 
> explicit
> read schema (reachable because schema inference already maps 
> isAdjustedToUTC=true
> to illegalType()):
> * Framework OFF -> succeeds (returns e.g. 23:59:59.123456)
> * Framework ON  -> fails with FAILED_READ_FILE / 
> cannotCreateParquetConverterForDataTypeError
> This contradicts the "behavior is identical in both cases" guarantee of the
> framework refactor. It is an edge case (only via explicit read schema) and is
> orthogonal to the framework wiring done in #55326, hence deferred to this
> follow-up.
> h3. Scope / required decision
> Decide the intended behavior and align both paths:
> # If ON/OFF equivalence is the intent: relax the framework guard to mirror the
>   original (drop the !isAdjustedToUTC / INT64-specific checks) so framework ON
>   and OFF behave identically.
> # If the stricter check is intentional: apply the same tightening to the
>   *Default (non-framework) path, add a test documenting the (now intended)
>   behavior change, and update the misleading "mirrors the inline guard" 
> comment.
> h3. Test cleanup
> Update TimeTypeParquetOpsSuite: its scaladoc and the comment around the
> isAdjustedToUTC case currently describe behavior in terms of the #55326 review
> history ("whichever resolution lands..."). Reword to state the chosen 
> invariant
> as intended behavior and reference this ticket instead of the PR thread.
> TimeTypeParquetOpsSuite already pins this case as a regression hook.
> h3. Does this introduce a user-facing change?
> Potentially yes, depending on the chosen resolution (reading
> TIME(MICROS, isAdjustedToUTC=true) as TimeType via explicit read schema). The
> resolution should make ON/OFF behavior consistent and document the final
> semantics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to