Didn't know about that. I'll have a look at it and check whether fix the issue or not. Thanks
El jue, 10 oct 2024, 13:29, Wenchen Fan <cloud0...@gmail.com> escribió: > There is a `try_to_timestamp` function but not `try_to_date`, we should > probably add it for users who don't want to get runtime errors when > processing big dataset. > > On Thu, Oct 10, 2024 at 11:05 AM Ángel <angel.alvarez.pas...@gmail.com> > wrote: > >> Hi, >> >> I opened a Jira ticket back in August, but it seems to have been >> overlooked. While it may not be a critical issue, I would appreciate if you >> could take a moment to consider it before deciding whether to close it. >> >> Here is the ticket for reference: >> SPARK-49288 <https://issues.apache.org/jira/browse/SPARK-49288> >> >> I've also written an article related to the issue, which you can find >> here: >> Apache Spark: WTF? Stranded on Dates Rows >> <https://medium.com/@angel.alvarez.pascua/apache-spark-wtf-stranded-on-dates-rows-74f0d9788b8b> >> >> In short, the problem occurs when the to_date built-in function >> encounters invalid date strings. Each time this happens, a new >> ParseException is thrown. While this isn't a big deal with small >> datasets, when you're processing millions of rows, the sheer volume of >> exceptions can become a significant performance issue. I understand that >> validating date strings is expensive, but checking for empty strings >> shouldn't be. >> >> I’m only asking for either an optimization for empty string checks or, at >> the very least, a warning in the documentation about the potential >> performance impact. >> >> Thanks for taking the time to consider this. >> >