[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351188#comment-16351188 ]
Felix Cheung commented on SPARK-23314: -------------------------------------- Thanks. I have isolated this to a different subset of data, but not yet able to pinpoint the exact row (mostly the value displayed is local but the data is UTC, and there is no match after adjusting for time zone) It might be with the data so in such case is there a way to help debug this? > Pandas grouped udf on dataset with timestamp column error > ---------------------------------------------------------- > > Key: SPARK-23314 > URL: https://issues.apache.org/jira/browse/SPARK-23314 > Project: Spark > Issue Type: Sub-task > Components: PySpark > Affects Versions: 2.3.0 > Reporter: Felix Cheung > Priority: Major > > Under SPARK-22216 > When testing pandas_udf on group bys, I saw this error with the timestamp > column. > File "pandas/_libs/tslib.pyx", line 3593, in > pandas._libs.tslib.tz_localize_to_utc > AmbiguousTimeError: Cannot infer dst time from Timestamp('2015-11-01 > 01:29:30'), try using the 'ambiguous' argument > For details, see Comment box. I'm able to reproduce this on the latest > branch-2.3 (last change from Feb 1 UTC) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org