[ 
https://issues.apache.org/jira/browse/SPARK-57582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-57582:
-----------------------------
    Description: 
h2. Background

{{TimeType}} Parquet I/O is handled by the types framework: 
{{ParquetTypeOps(dt)}} returns {{Some(TimeTypeParquetOps(...))}} for 
{{TimeType}}, and the three Parquet entry points dispatch framework-first 
({{ParquetTypeOps(dt).map(...).getOrElse(default)}}):
* schema (write): {{SparkToParquetSchemaConverter.convertField}} -> 
{{TimeTypeParquetOps.convertToParquetType}}
* value (write): {{ParquetWriteSupport.makeWriter}} -> 
{{TimeTypeParquetOps.makeWriter}}
* value (row read): {{ParquetRowConverter.newConverter}} -> 
{{TimeTypeParquetOps.newConverter}}

Each entry point still carries an inline {{case _: TimeType}} branch in its 
{{*Default}} fallback. Because {{ParquetTypeOps(TimeType)}} always returns 
{{Some(...)}}, the framework op always wins and these fallbacks never execute.

h2. Problem

The fallbacks are dead code that nonetheless duplicates real logic. SPARK-57551 
extended {{TimeType}} to nanosecond precision and added the {{TIME(MICROS)}} 
(precision 0..6) vs {{TIME(NANOS)}} (precision 7..9) bifurcation, plus 
read-time precision truncation, to *both* the live framework op and the three 
shadowed fallbacks. The duplicated copies are a maintenance hazard:
* an edit to a fallback has no runtime effect (silently ineffective);
* an edit to the live op that misses the fallbacks leaves stale copies that 
would become live (and wrong) if the framework dispatch were ever removed for 
{{TimeType}}.

Dead, duplicated branches (master after SPARK-57551):
* {{ParquetSchemaConverter.scala}}: {{convertFieldDefault}}, {{case t: 
TimeType}}
* {{ParquetWriteSupport.scala}}: {{makeWriterDefault}}, {{case t: TimeType if 
t.precision > MICROS_PRECISION}} and {{case _: TimeType}}
* {{ParquetRowConverter.scala}}: {{newConverterDefault}}, {{case t: TimeType}}

h2. Proposal

Remove the shadowed {{TimeType}} fallback branches so {{TimeTypeParquetOps}} is 
the single source of truth, or document that the fallbacks are intentionally 
retained during the types-framework migration. More broadly, any type fully 
migrated to {{ParquetTypeOps}} leaves a dead {{*Default}} fallback case; audit 
and drop those consistently as part of the migration.

h2. Notes

Found during code review of SPARK-57551 (PR 
https://github.com/apache/spark/pull/56622). No user-facing behavior change; 
cleanup and maintainability only.


> Remove dead TimeType fallback branches shadowed by the Parquet types 
> framework dispatch
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-57582
>                 URL: https://issues.apache.org/jira/browse/SPARK-57582
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Priority: Major
>
> h2. Background
> {{TimeType}} Parquet I/O is handled by the types framework: 
> {{ParquetTypeOps(dt)}} returns {{Some(TimeTypeParquetOps(...))}} for 
> {{TimeType}}, and the three Parquet entry points dispatch framework-first 
> ({{ParquetTypeOps(dt).map(...).getOrElse(default)}}):
> * schema (write): {{SparkToParquetSchemaConverter.convertField}} -> 
> {{TimeTypeParquetOps.convertToParquetType}}
> * value (write): {{ParquetWriteSupport.makeWriter}} -> 
> {{TimeTypeParquetOps.makeWriter}}
> * value (row read): {{ParquetRowConverter.newConverter}} -> 
> {{TimeTypeParquetOps.newConverter}}
> Each entry point still carries an inline {{case _: TimeType}} branch in its 
> {{*Default}} fallback. Because {{ParquetTypeOps(TimeType)}} always returns 
> {{Some(...)}}, the framework op always wins and these fallbacks never execute.
> h2. Problem
> The fallbacks are dead code that nonetheless duplicates real logic. 
> SPARK-57551 extended {{TimeType}} to nanosecond precision and added the 
> {{TIME(MICROS)}} (precision 0..6) vs {{TIME(NANOS)}} (precision 7..9) 
> bifurcation, plus read-time precision truncation, to *both* the live 
> framework op and the three shadowed fallbacks. The duplicated copies are a 
> maintenance hazard:
> * an edit to a fallback has no runtime effect (silently ineffective);
> * an edit to the live op that misses the fallbacks leaves stale copies that 
> would become live (and wrong) if the framework dispatch were ever removed for 
> {{TimeType}}.
> Dead, duplicated branches (master after SPARK-57551):
> * {{ParquetSchemaConverter.scala}}: {{convertFieldDefault}}, {{case t: 
> TimeType}}
> * {{ParquetWriteSupport.scala}}: {{makeWriterDefault}}, {{case t: TimeType if 
> t.precision > MICROS_PRECISION}} and {{case _: TimeType}}
> * {{ParquetRowConverter.scala}}: {{newConverterDefault}}, {{case t: TimeType}}
> h2. Proposal
> Remove the shadowed {{TimeType}} fallback branches so {{TimeTypeParquetOps}} 
> is the single source of truth, or document that the fallbacks are 
> intentionally retained during the types-framework migration. More broadly, 
> any type fully migrated to {{ParquetTypeOps}} leaves a dead {{*Default}} 
> fallback case; audit and drop those consistently as part of the migration.
> h2. Notes
> Found during code review of SPARK-57551 (PR 
> https://github.com/apache/spark/pull/56622). No user-facing behavior change; 
> cleanup and maintainability only.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to