Max Gekk created SPARK-57416:
--------------------------------

             Summary: Types Framework - Resolve TimeType Parquet read-path 
guard ON/OFF divergence for TIME(MICROS, isAdjustedToUTC=true)
                 Key: SPARK-57416
                 URL: https://issues.apache.org/jira/browse/SPARK-57416
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk


*Parent:* SPARK-55444 (Types Framework), follow-up of PR apache/spark#55326 
(Phase 3a - Storage Formats - Parquet).

h3. Problem

When the Types Framework integrates TimeType into the Parquet read path,
TimeTypeParquetOps.requireCompatibleParquetType enforces a stricter guard than
the inline guard it replaces in ParquetRowConverter:

* New guard (framework ON): INT64 && unit == MICROS && !isAdjustedToUTC
* Original guard (framework OFF): TimeLogicalTypeAnnotation && unit == MICROS
  (does NOT check isAdjustedToUTC)

The new guard is strictly tighter, even though the code comment claims it
"mirrors the inline guard that existed in ParquetRowConverter before the
framework dispatch".

h3. Observed behavior difference

Reading a TIME(MICROS, isAdjustedToUTC=true) column as TimeType via an explicit
read schema (reachable because schema inference already maps 
isAdjustedToUTC=true
to illegalType()):

* Framework OFF -> succeeds (returns e.g. 23:59:59.123456)
* Framework ON  -> fails with FAILED_READ_FILE / 
cannotCreateParquetConverterForDataTypeError

This contradicts the "behavior is identical in both cases" guarantee of the
framework refactor. It is an edge case (only via explicit read schema) and is
orthogonal to the framework wiring done in #55326, hence deferred to this
follow-up.

h3. Scope / required decision

Decide the intended behavior and align both paths:

# If ON/OFF equivalence is the intent: relax the framework guard to mirror the
  original (drop the !isAdjustedToUTC / INT64-specific checks) so framework ON
  and OFF behave identically.
# If the stricter check is intentional: apply the same tightening to the
  *Default (non-framework) path, add a test documenting the (now intended)
  behavior change, and update the misleading "mirrors the inline guard" comment.

h3. Test cleanup

Update TimeTypeParquetOpsSuite: its scaladoc and the comment around the
isAdjustedToUTC case currently describe behavior in terms of the #55326 review
history ("whichever resolution lands..."). Reword to state the chosen invariant
as intended behavior and reference this ticket instead of the PR thread.
TimeTypeParquetOpsSuite already pins this case as a regression hook.

h3. Does this introduce a user-facing change?

Potentially yes, depending on the chosen resolution (reading
TIME(MICROS, isAdjustedToUTC=true) as TimeType via explicit read schema). The
resolution should make ON/OFF behavior consistent and document the final
semantics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to