Livia Zhu created SPARK-50492:
---------------------------------

             Summary: Fix java.util.NoSuchElementException when watermark 
column is dropped after dropDuplicatesWithinWatermark
                 Key: SPARK-50492
                 URL: https://issues.apache.org/jira/browse/SPARK-50492
             Project: Spark
          Issue Type: Task
          Components: Structured Streaming
    Affects Versions: 4.0.0
            Reporter: Livia Zhu


Consider the following query:

```
val result = inputData.toDF()
.select("_1", "_2")
.withColumn("timestamp", to_timestamp($"_2", "yyyy-MM-dd HH:mm:ss"))
.withWatermark("timestamp", "24 hours")
.dropDuplicatesWithinWatermark("timestamp")
.select("_1")[]
```
 
Currently, the ColumnPruning optimization will prune the `timestamp` column 
since it is not selected in the final Project, leading to a 
`java.util.NoSuchElementException` when we try to get the event time column in 
DeduplicateWithinWatermarkExec.
 
We need to update the references for the DeduplicateWithinWatermark logical 
plan node so that the event time column is included in the references.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to