andygrove opened a new issue, #4619: URL: https://github.com/apache/datafusion-comet/issues/4619
## What is the problem the feature request solves? The structured-text functions have no Comet implementation, so any query using them falls back to Spark for the enclosing operator: - CSV: `from_csv`, `schema_of_csv` - JSON: `schema_of_json`, `json_object_keys` - XPath: `xpath`, `xpath_boolean/short/int/long/float/double/string` - XML (Spark 4.0+): `from_xml`, `to_xml`, `schema_of_xml` They are hard to implement natively in Rust (CSV/JSON/XML parsing with Spark-specific semantics). ## Describe the potential solution These all extend Spark's `CodegenFallback`, which the codegen dispatcher already admits (the same mechanism backing `from_json`/`to_json`). Routing them through the dispatcher keeps a top-level projection native while matching Spark exactly. On Spark 3.4/3.5 they are plain expressions and can be registered directly in the serde maps. On Spark 4.x they are `RuntimeReplaceable` and the optimizer rewrites them to `Invoke(evaluator)` / `StaticInvoke` before Comet sees the plan, so they must be dispatched from the 4.x shim (mirroring how `from_json`/`to_json`/`parse_url` are already handled). ## Additional context Tier 2 of the codegen-dispatch expansion identified in #4616. Related: the HOF tier in #4618. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
