Alexis Seigneurin created SPARK-20110:
-----------------------------------------
Summary: Windowed aggregation do not work when the timestamp is a
nested field
Key: SPARK-20110
URL: https://issues.apache.org/jira/browse/SPARK-20110
Project: Spark
Issue Type: Bug
Components: Input/Output
Affects Versions: 2.1.0
Reporter: Alexis Seigneurin
I am loading data into a DataFrame with nested fields. I want to perform a
windowed aggregation on the timestamp from a nested fields:
{code}
.groupBy(window($"auth.sysEntryTimestamp", "2 minutes"))
{code}
I get the following error:
{quote}
org.apache.spark.sql.AnalysisException: Multiple time window expressions would
result in a cartesian product of rows, therefore they are not currently not
supported.
{quote}
This works fine if I first extract the timestamp to a separate column:
{code}
.withColumn("sysEntryTimestamp", $"auth.sysEntryTimestamp")
.groupBy(
window($"sysEntryTimestamp", "2 minutes")
)
{code}
Please see the whole sample:
- batch:
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4683710270868386/4278399007363210/3769253384867782/latest.html
- Structured Streaming:
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4683710270868386/4278399007363192/3769253384867782/latest.html
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]