[
https://issues.apache.org/jira/browse/SPARK-57464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Max Gekk updated SPARK-57464:
-----------------------------
Affects Version/s: 4.3.0
(was: 5.0.0)
> Types Framework - add a string parse hook (string to internal value)
> symmetric to format
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-57464
> URL: https://issues.apache.org/jira/browse/SPARK-57464
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Priority: Major
>
> Umbrella: SPARK-53504 (Types Framework).
> The framework's client-side TypeApiOps trait provides string output via
> format / formatUTF8 / toSQLValue (internal value -> string), and
> cast-to-string is fully routed through it (e.g. SPARK-57285 for nanosecond
> timestamps). There is no symmetric inverse: no parse / parseUTF8 hook to turn
> a string into the type's internal value. As a result, the string -> value
> direction (CAST(string AS T), default to_timestamp/to_time style parsing) is
> still implemented per type outside the framework (e.g. in Cast.scala), so
> each new framework type re-scatters this logic.
> Proposal:
> - Add an optional parse hook to TypeApiOps, e.g. def parse(s: UTF8String):
> Option[Any] (or def parseString(s: String)), returning the type's internal
> representation, symmetric to format. Default None so existing types fall
> through to legacy handling.
> - Route the cast-from-string path (Cast / ToStringBase counterpart) through
> the hook the same way cast-to-string already flows through format, with
> codegen and interpreted parity.
> - Provide the reference implementation for TimeType, and let the nanosecond
> timestamp types (SPARK-56822) reuse it.
> Notes / scope boundary:
> - The framework hook uses the type's default/fraction formatter. Datasource
> readers that honor user-configurable patterns (JSON/CSV/XML timestampFormat,
> locale, zone) are out of scope here; those keep their own configurable
> formatters.
> - This is a framework primitive, consumed by expressions/cast rather than an
> expression itself (consistent with SPARK-53504 scope).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]