Vishal Doshi created SPARK-22070:
------------------------------------
Summary: Spark SQL filter comparisons failing with timestamps and
ISO-8601 strings
Key: SPARK-22070
URL: https://issues.apache.org/jira/browse/SPARK-22070
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 2.2.0
Reporter: Vishal Doshi
Priority: Minor
Filter behavior seems like it's ignoring time in the ISO-8601 string. See below
for code to reproduce:
{code}
import datetime
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, TimestampType
spark = SparkSession.builder.getOrCreate()
data = [{"dates": datetime.datetime(2017, 1, 1, 12)}]
schema = StructType([StructField("dates", TimestampType())])
df = spark.createDataFrame(data, schema=schema)
# df.head() returns (correctly):
# Row(dates=datetime.datetime(2017, 1, 1, 12, 0))
df.filter(df["dates"] > datetime.datetime(2017, 1, 1, 11).isoformat()).count()
# should return 1, instead returns 0
# datetime.datetime(2017, 1, 1, 11).isoformat() returns '2017-01-01T11:00:00'
df.filter(df["dates"] > datetime.datetime(2016, 12, 31, 11).isoformat()).count()
# this one works
{code}
Of course, the simple work around is to use the datetime objects themselves in
the query expression, but in practice, this means using dateutil to parse some
data, which is not ideal.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]