Max Gekk created SPARK-57464:
--------------------------------
Summary: Types Framework - add a string parse hook (string to
internal value) symmetric to format
Key: SPARK-57464
URL: https://issues.apache.org/jira/browse/SPARK-57464
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 5.0.0
Reporter: Max Gekk
Umbrella: SPARK-53504 (Types Framework).
The framework's client-side TypeApiOps trait provides string output via format
/ formatUTF8 / toSQLValue (internal value -> string), and cast-to-string is
fully routed through it (e.g. SPARK-57285 for nanosecond timestamps). There is
no symmetric inverse: no parse / parseUTF8 hook to turn a string into the
type's internal value. As a result, the string -> value direction (CAST(string
AS T), default to_timestamp/to_time style parsing) is still implemented per
type outside the framework (e.g. in Cast.scala), so each new framework type
re-scatters this logic.
Proposal:
- Add an optional parse hook to TypeApiOps, e.g. def parse(s: UTF8String):
Option[Any] (or def parseString(s: String)), returning the type's internal
representation, symmetric to format. Default None so existing types fall
through to legacy handling.
- Route the cast-from-string path (Cast / ToStringBase counterpart) through the
hook the same way cast-to-string already flows through format, with codegen and
interpreted parity.
- Provide the reference implementation for TimeType, and let the nanosecond
timestamp types (SPARK-56822) reuse it.
Notes / scope boundary:
- The framework hook uses the type's default/fraction formatter. Datasource
readers that honor user-configurable patterns (JSON/CSV/XML timestampFormat,
locale, zone) are out of scope here; those keep their own configurable
formatters.
- This is a framework primitive, consumed by expressions/cast rather than an
expression itself (consistent with SPARK-53504 scope).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]