[ 
https://issues.apache.org/jira/browse/SPARK-57582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-57582:
-----------------------------------
    Labels: pull-request-available  (was: )

> Remove dead TimeType fallback branches shadowed by the Parquet types 
> framework dispatch
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-57582
>                 URL: https://issues.apache.org/jira/browse/SPARK-57582
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Priority: Major
>              Labels: pull-request-available
>
> h2. Background
> {{TimeType}} Parquet I/O is handled by the types framework: 
> {{ParquetTypeOps(dt)}} returns {{Some(TimeTypeParquetOps(...))}} for 
> {{TimeType}}, and the three Parquet entry points dispatch framework-first 
> ({{ParquetTypeOps(dt).map(...).getOrElse(default)}}):
> * schema (write): {{SparkToParquetSchemaConverter.convertField}} -> 
> {{TimeTypeParquetOps.convertToParquetType}}
> * value (write): {{ParquetWriteSupport.makeWriter}} -> 
> {{TimeTypeParquetOps.makeWriter}}
> * value (row read): {{ParquetRowConverter.newConverter}} -> 
> {{TimeTypeParquetOps.newConverter}}
> Each entry point still carries an inline {{case _: TimeType}} branch in its 
> {{*Default}} fallback. Because {{ParquetTypeOps(TimeType)}} always returns 
> {{Some(...)}}, the framework op always wins and these fallbacks never execute.
> h2. Problem
> The fallbacks are dead code that nonetheless duplicates real logic. 
> SPARK-57551 extended {{TimeType}} to nanosecond precision and added the 
> {{TIME(MICROS)}} (precision 0..6) vs {{TIME(NANOS)}} (precision 7..9) 
> bifurcation, plus read-time precision truncation, to *both* the live 
> framework op and the three shadowed fallbacks. The duplicated copies are a 
> maintenance hazard:
> * an edit to a fallback has no runtime effect (silently ineffective);
> * an edit to the live op that misses the fallbacks leaves stale copies that 
> would become live (and wrong) if the framework dispatch were ever removed for 
> {{TimeType}}.
> Dead, duplicated branches (master after SPARK-57551):
> * {{ParquetSchemaConverter.scala}}: {{convertFieldDefault}}, {{case t: 
> TimeType}}
> * {{ParquetWriteSupport.scala}}: {{makeWriterDefault}}, {{case t: TimeType if 
> t.precision > MICROS_PRECISION}} and {{case _: TimeType}}
> * {{ParquetRowConverter.scala}}: {{newConverterDefault}}, {{case t: TimeType}}
> h2. Proposal
> Remove the shadowed {{TimeType}} fallback branches so {{TimeTypeParquetOps}} 
> is the single source of truth, or document that the fallbacks are 
> intentionally retained during the types-framework migration. More broadly, 
> any type fully migrated to {{ParquetTypeOps}} leaves a dead {{*Default}} 
> fallback case; audit and drop those consistently as part of the migration.
> h2. Notes
> Found during code review of SPARK-57551 (PR 
> https://github.com/apache/spark/pull/56622). No user-facing behavior change; 
> cleanup and maintainability only.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to