[ 
https://issues.apache.org/jira/browse/SPARK-57464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-57464:
-----------------------------
    Affects Version/s: 4.3.0
                           (was: 5.0.0)

> Types Framework - add a string parse hook (string to internal value) 
> symmetric to format
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-57464
>                 URL: https://issues.apache.org/jira/browse/SPARK-57464
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Priority: Major
>
> Umbrella: SPARK-53504 (Types Framework).
> The framework's client-side TypeApiOps trait provides string output via 
> format / formatUTF8 / toSQLValue (internal value -> string), and 
> cast-to-string is fully routed through it (e.g. SPARK-57285 for nanosecond 
> timestamps). There is no symmetric inverse: no parse / parseUTF8 hook to turn 
> a string into the type's internal value. As a result, the string -> value 
> direction (CAST(string AS T), default to_timestamp/to_time style parsing) is 
> still implemented per type outside the framework (e.g. in Cast.scala), so 
> each new framework type re-scatters this logic.
> Proposal:
> - Add an optional parse hook to TypeApiOps, e.g. def parse(s: UTF8String): 
> Option[Any] (or def parseString(s: String)), returning the type's internal 
> representation, symmetric to format. Default None so existing types fall 
> through to legacy handling.
> - Route the cast-from-string path (Cast / ToStringBase counterpart) through 
> the hook the same way cast-to-string already flows through format, with 
> codegen and interpreted parity.
> - Provide the reference implementation for TimeType, and let the nanosecond 
> timestamp types (SPARK-56822) reuse it.
> Notes / scope boundary:
> - The framework hook uses the type's default/fraction formatter. Datasource 
> readers that honor user-configurable patterns (JSON/CSV/XML timestampFormat, 
> locale, zone) are out of scope here; those keep their own configurable 
> formatters.
> - This is a framework primitive, consumed by expressions/cast rather than an 
> expression itself (consistent with SPARK-53504 scope).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to