Hi Sean,

Even with MM for months it gives incorrect (but different this time) min
value.

Le mar. 14 juin 2022 à 20:18, Sean Owen <sro...@gmail.com> a écrit :

> Yes that is right. It has to be parsed as a date to correctly reason about
> ordering. Otherwise you are finding the minimum string alphabetically.
>
> Small note, MM is month. mm is minute. You have to fix that for this to
> work. These are Java format strings.
>
> On Tue, Jun 14, 2022, 12:32 PM marc nicole <mk1853...@gmail.com> wrote:
>
>> Hi,
>>
>> I want to identify a column of dates as such, the column has formatted
>> strings in the likes of: "06-14-2022" (the format being mm-dd-yyyy) and get
>> the minimum of those dates.
>>
>> I tried in Java as follows:
>>
>> if (dataset.filter(org.apache.spark.sql.functions.to_date(
>>> dataset.col(colName), "mm-dd-yyyy").isNotNull()).select(colName).count() !=
>>> 0) { ....
>>
>>
>> And to get the *min *of the column:
>>
>> Object colMin =
>>> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName),
>>> "mm-dd-yyyy"))).first().get(0);
>>
>> // then I cast the *colMin *to string.
>>
>> To note that if i don't apply *to_date*() to the target column then the
>> result will be erroneous (i think Spark will take the values as string and
>> will get the min as if it was applied on an alphabetical string).
>>
>> Any better approach to accomplish this?
>> Thanks.
>>
>

Reply via email to