This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
     new 455a57d  [MINOR][DOCS] Fix for contradiction in condition formula of 
keeping intermediate state of window in structured streaming docs
455a57d is described below

commit 455a57d55e88f85d9e34555cfcc845fda051cec0
Author: Viktor Tarasenko <v.tarase...@vezet.ru>
AuthorDate: Wed Feb 13 08:01:20 2019 -0600

    [MINOR][DOCS] Fix for contradiction in condition formula of keeping 
intermediate state of window in structured streaming docs
    
    This change solves contradiction in structured streaming documentation in 
formula which tests if specific window will be updated by calculating watermark 
and comparing with "T" parameter(intermediate state is cleared as (max event 
time seen by the engine - late threshold > T), otherwise kept(written as 
"until")). By further examples the "T" seems to be the end of the window, not 
start like documentation says firstly. For more information please take a look 
at my question in stackoverf [...]
    
    Can be tested by building documentation.
    
    Closes #23765 from vitektarasenko/master.
    
    Authored-by: Viktor Tarasenko <v.tarase...@vezet.ru>
    Signed-off-by: Sean Owen <sean.o...@databricks.com>
    (cherry picked from commit 5894f767d1f159fc05e11d77d61089efcd0c50b4)
    Signed-off-by: Sean Owen <sean.o...@databricks.com>
---
 docs/structured-streaming-programming-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 3678bfb..3d91223 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -922,7 +922,7 @@ late data for that aggregate any more. To enable this, in 
Spark 2.1, we have int
 **watermarking**, which lets the engine automatically track the current event 
time in the data
 and attempt to clean up old state accordingly. You can define the watermark of 
a query by 
 specifying the event time column and the threshold on how late the data is 
expected to be in terms of 
-event time. For a specific window starting at time `T`, the engine will 
maintain state and allow late
+event time. For a specific window ending at time `T`, the engine will maintain 
state and allow late
 data to update the state until `(max event time seen by the engine - late 
threshold > T)`. 
 In other words, late data within the threshold will be aggregated, 
 but data later than the threshold will start getting dropped


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to