andygrove opened a new issue, #4619:
URL: https://github.com/apache/datafusion-comet/issues/4619

   ## What is the problem the feature request solves?
   
   The structured-text functions have no Comet implementation, so any query 
using them falls back to Spark for the enclosing operator:
   
   - CSV: `from_csv`, `schema_of_csv`
   - JSON: `schema_of_json`, `json_object_keys`
   - XPath: `xpath`, `xpath_boolean/short/int/long/float/double/string`
   - XML (Spark 4.0+): `from_xml`, `to_xml`, `schema_of_xml`
   
   They are hard to implement natively in Rust (CSV/JSON/XML parsing with 
Spark-specific semantics).
   
   ## Describe the potential solution
   
   These all extend Spark's `CodegenFallback`, which the codegen dispatcher 
already admits (the same mechanism backing `from_json`/`to_json`). Routing them 
through the dispatcher keeps a top-level projection native while matching Spark 
exactly.
   
   On Spark 3.4/3.5 they are plain expressions and can be registered directly 
in the serde maps. On Spark 4.x they are `RuntimeReplaceable` and the optimizer 
rewrites them to `Invoke(evaluator)` / `StaticInvoke` before Comet sees the 
plan, so they must be dispatched from the 4.x shim (mirroring how 
`from_json`/`to_json`/`parse_url` are already handled).
   
   ## Additional context
   
   Tier 2 of the codegen-dispatch expansion identified in #4616. Related: the 
HOF tier in #4618.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to