Felix Cheung commented on SPARK-23314:

Thanks. I have isolated this to a different subset of data, but not yet able to 
pinpoint the exact row (mostly the value displayed is local but the data is 
UTC, and there is no match after adjusting for time zone) It might be with the 
data so in such case is there a way to help debug this?

> Pandas grouped udf on dataset with timestamp column error 
> ----------------------------------------------------------
>                 Key: SPARK-23314
>                 URL: https://issues.apache.org/jira/browse/SPARK-23314
>             Project: Spark
>          Issue Type: Sub-task
>          Components: PySpark
>    Affects Versions: 2.3.0
>            Reporter: Felix Cheung
>            Priority: Major
> Under  SPARK-22216
> When testing pandas_udf on group bys, I saw this error with the timestamp 
> column.
> File "pandas/_libs/tslib.pyx", line 3593, in 
> pandas._libs.tslib.tz_localize_to_utc
> AmbiguousTimeError: Cannot infer dst time from Timestamp('2015-11-01 
> 01:29:30'), try using the 'ambiguous' argument
> For details, see Comment box. I'm able to reproduce this on the latest 
> branch-2.3 (last change from Feb 1 UTC)

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to