Omega359 opened a new issue, #10744: URL: https://github.com/apache/datafusion/issues/10744
### Is your feature request related to a problem or challenge? During work on adding a 'safe' mode to to_timestamp and to_date functions I've come across an issue that I would like feedback on before proceeding. ### The feature Currently for timestamp and date parsing if a source string cannot be parsed using any of the provided chrono formats datafusion will return an error. This is normal for a database-type solution however it is not ideal for a system that is parsing billions of dates in batches - some of which are human entered. Systems such as Spark default to a null value for anything that cannot be parsed and this feature enables a mode ('safe' to mirror the name and same behaviour as CastOptions) that allows the to_timestamp* and to_date UDFs to have the same behaviour (return null on error). ### The problem Since UDF functions have no context provided and there isn't a way I know of to statically get access to config to add the avove mentioned functionality I resorted to using a new constructor function to allow the UDF to switch behaviour: ``` #[derive(Debug)] pub struct ToTimestampFunc { signature: Signature, /// how to handle cast or parsing failures, either return NULL (safe=true) or return ERR (safe=false) pub safe: bool, } impl ToTimestampFunc { pub fn new() -> Self { Self { signature: Signature::variadic_any(Volatility::Immutable), safe: false, } } pub fn new_with_safe(safe: bool) -> Self { Self { signature: Signature::variadic_any(Volatility::Immutable), safe, } } } ``` To use the alternative 'safe' mode for these functions is as simple as ``` let to_timestamp = ToTimestampFunc::new_with_safe(true); session_context.register_udf(ScalarUDF::new_from_impl(to_timestamp)); ``` Unfortunately this only affect sql queries - any calls to the [to_timestamp(args: Vec<Expr>)](https://github.com/apache/datafusion/blob/09dde27be39ad054f85dfb5c37b7468a3f68d652/datafusion/functions/src/datetime/mod.rs#L257) function will not use the new definition as registered in the function registry. This is because that function and every other function like it use a static singleton instance that only uses a ::new() call to initial it and there is no way that I can see to replace that instance. ### Describe the solution you'd like I see a few possible solutions to this: - Acknowledge the difference in behaviour in documentation and recommend using `ctx.udf("to_timestamp").unwrap().call(args)` instead of the to_timestamp() function anytime 'safe' mode is required. This is less than ideal imho as it can lead to confusion and unintuitive behavior. - Provide a mechanism (function argument to invoke perhaps) to provide a SessionContext to all UDF's so that they can adjust behaviour based on configuration at call time. - Update the to_timestamp() function and all similar functions to have alternatives that do not use the statically defined version of the UDF but rather create it on-demand or use the definition saved in the session state. As examples: ``` // the existing function pub fn to_timestamp(args: Vec<Expr>) -> Expr { super::to_timestamp().call(args) } pub fn to_timestamp_safe(args: Vec<Expr>) -> Expr { ScalarUDF::new_from_impl(ToTimestampFunc::new_with_safe(true)).call(args) } // or pub fn to_timestamp_from_context(ctx &Arc<SessionContext>, args: Vec<Expr>) -> Expr { ctx.udf("to_timestamp").unwrap().call(args) } // varargs would have been a really nice solution as well but 🤷 ``` - something else I haven't thought of. Any opinions, suggestions and critiques would be welcome. ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org