This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-2.4 by this push: new 455a57d [MINOR][DOCS] Fix for contradiction in condition formula of keeping intermediate state of window in structured streaming docs 455a57d is described below commit 455a57d55e88f85d9e34555cfcc845fda051cec0 Author: Viktor Tarasenko <v.tarase...@vezet.ru> AuthorDate: Wed Feb 13 08:01:20 2019 -0600 [MINOR][DOCS] Fix for contradiction in condition formula of keeping intermediate state of window in structured streaming docs This change solves contradiction in structured streaming documentation in formula which tests if specific window will be updated by calculating watermark and comparing with "T" parameter(intermediate state is cleared as (max event time seen by the engine - late threshold > T), otherwise kept(written as "until")). By further examples the "T" seems to be the end of the window, not start like documentation says firstly. For more information please take a look at my question in stackoverf [...] Can be tested by building documentation. Closes #23765 from vitektarasenko/master. Authored-by: Viktor Tarasenko <v.tarase...@vezet.ru> Signed-off-by: Sean Owen <sean.o...@databricks.com> (cherry picked from commit 5894f767d1f159fc05e11d77d61089efcd0c50b4) Signed-off-by: Sean Owen <sean.o...@databricks.com> --- docs/structured-streaming-programming-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 3678bfb..3d91223 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -922,7 +922,7 @@ late data for that aggregate any more. To enable this, in Spark 2.1, we have int **watermarking**, which lets the engine automatically track the current event time in the data and attempt to clean up old state accordingly. You can define the watermark of a query by specifying the event time column and the threshold on how late the data is expected to be in terms of -event time. For a specific window starting at time `T`, the engine will maintain state and allow late +event time. For a specific window ending at time `T`, the engine will maintain state and allow late data to update the state until `(max event time seen by the engine - late threshold > T)`. In other words, late data within the threshold will be aggregated, but data later than the threshold will start getting dropped --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org