[
https://issues.apache.org/jira/browse/SPARK-57416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089004#comment-18089004
]
Max Gekk commented on SPARK-57416:
----------------------------------
[~yadavay] Go ahead.
> Types Framework - Resolve TimeType Parquet read-path guard ON/OFF divergence
> for TIME(MICROS, isAdjustedToUTC=true)
> -------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-57416
> URL: https://issues.apache.org/jira/browse/SPARK-57416
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Priority: Major
>
> *Parent:* SPARK-55444 (Types Framework), follow-up of PR apache/spark#55326
> (Phase 3a - Storage Formats - Parquet).
> h3. Problem
> When the Types Framework integrates TimeType into the Parquet read path,
> TimeTypeParquetOps.requireCompatibleParquetType enforces a stricter guard than
> the inline guard it replaces in ParquetRowConverter:
> * New guard (framework ON): INT64 && unit == MICROS && !isAdjustedToUTC
> * Original guard (framework OFF): TimeLogicalTypeAnnotation && unit == MICROS
> (does NOT check isAdjustedToUTC)
> The new guard is strictly tighter, even though the code comment claims it
> "mirrors the inline guard that existed in ParquetRowConverter before the
> framework dispatch".
> h3. Observed behavior difference
> Reading a TIME(MICROS, isAdjustedToUTC=true) column as TimeType via an
> explicit
> read schema (reachable because schema inference already maps
> isAdjustedToUTC=true
> to illegalType()):
> * Framework OFF -> succeeds (returns e.g. 23:59:59.123456)
> * Framework ON -> fails with FAILED_READ_FILE /
> cannotCreateParquetConverterForDataTypeError
> This contradicts the "behavior is identical in both cases" guarantee of the
> framework refactor. It is an edge case (only via explicit read schema) and is
> orthogonal to the framework wiring done in #55326, hence deferred to this
> follow-up.
> h3. Scope / required decision
> Decide the intended behavior and align both paths:
> # If ON/OFF equivalence is the intent: relax the framework guard to mirror the
> original (drop the !isAdjustedToUTC / INT64-specific checks) so framework ON
> and OFF behave identically.
> # If the stricter check is intentional: apply the same tightening to the
> *Default (non-framework) path, add a test documenting the (now intended)
> behavior change, and update the misleading "mirrors the inline guard"
> comment.
> h3. Test cleanup
> Update TimeTypeParquetOpsSuite: its scaladoc and the comment around the
> isAdjustedToUTC case currently describe behavior in terms of the #55326 review
> history ("whichever resolution lands..."). Reword to state the chosen
> invariant
> as intended behavior and reference this ticket instead of the PR thread.
> TimeTypeParquetOpsSuite already pins this case as a regression hook.
> h3. Does this introduce a user-facing change?
> Potentially yes, depending on the chosen resolution (reading
> TIME(MICROS, isAdjustedToUTC=true) as TimeType via explicit read schema). The
> resolution should make ON/OFF behavior consistent and document the final
> semantics.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]