spark git commit: [SPARK-16256][DOCS] Fix window operation diagram

tdas Thu, 30 Jun 2016 14:01:53 -0700

Repository: spark
Updated Branches:
  refs/heads/master c62263340 -> 5d00a7bc1



[SPARK-16256][DOCS] Fix window operation diagram

Author: Tathagata Das <tathagata.das1...@gmail.com>

Closes #14001 from tdas/SPARK-16256-2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5d00a7bc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5d00a7bc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5d00a7bc

Branch: refs/heads/master
Commit: 5d00a7bc19ddeb1b5247733b55095a03ee7b1a30
Parents: c622633
Author: Tathagata Das <tathagata.das1...@gmail.com>
Authored: Thu Jun 30 14:01:34 2016 -0700
Committer: Tathagata Das <tathagata.das1...@gmail.com>
Committed: Thu Jun 30 14:01:34 2016 -0700

----------------------------------------------------------------------
 docs/img/structured-streaming-late-data.png    | Bin 138931 -> 138226 bytes
 docs/img/structured-streaming-window.png       | Bin 128930 -> 132875 bytes
 docs/img/structured-streaming.pptx             | Bin 1105315 -> 1105413 bytes
 docs/structured-streaming-programming-guide.md |   2 +-
 4 files changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/5d00a7bc/docs/img/structured-streaming-late-data.png
----------------------------------------------------------------------
diff --git a/docs/img/structured-streaming-late-data.png 
b/docs/img/structured-streaming-late-data.png
index 5276b47..2283f67 100644
Binary files a/docs/img/structured-streaming-late-data.png and 
b/docs/img/structured-streaming-late-data.png differ

http://git-wip-us.apache.org/repos/asf/spark/blob/5d00a7bc/docs/img/structured-streaming-window.png
----------------------------------------------------------------------
diff --git a/docs/img/structured-streaming-window.png 
b/docs/img/structured-streaming-window.png
index be9d3fb..c1842b1 100644
Binary files a/docs/img/structured-streaming-window.png and 
b/docs/img/structured-streaming-window.png differ

http://git-wip-us.apache.org/repos/asf/spark/blob/5d00a7bc/docs/img/structured-streaming.pptx
----------------------------------------------------------------------
diff --git a/docs/img/structured-streaming.pptx 
b/docs/img/structured-streaming.pptx
index c278323..6aad2ed 100644
Binary files a/docs/img/structured-streaming.pptx and 
b/docs/img/structured-streaming.pptx differ

http://git-wip-us.apache.org/repos/asf/spark/blob/5d00a7bc/docs/structured-streaming-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 5932566..7949396 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -620,7 +620,7 @@ df.groupBy("type").count()
 ### Window Operations on Event Time
 Aggregations over a sliding event-time window are straightforward with 
Structured Streaming. The key idea to understand about window-based 
aggregations are very similar to grouped aggregations. In a grouped 
aggregation, aggregate values (e.g. counts) are maintained for each unique 
value in the user-specified grouping column. In case of window-based 
aggregations, aggregate values are maintained for each window the event-time of 
a row falls into. Let's understand this with an illustration. 
 
-Imagine the quick example is modified and the stream contains lines along with 
the time when the line was generated. Instead of running word counts, we want 
to count words within 10 minute windows, updating every 5 minutes. That is, 
word counts in words received between 10 minute windows 12:00 - 12:10, 12:05 - 
12:15, 12:10 - 12:20, etc. Note that 12:00 - 12:10 means data that arrived 
after 12:00 but before 12:10. Now, consider a word that was received at 12:07. 
This word should increment the counts corresponding to two windows 12:00 - 
12:10 and 12:05 - 12:15. So the counts will be indexed by both, the grouping 
key (i.e. the word) and the window (can be calculated from the event-time).
+Imagine our quick example is modified and the stream now contains lines along 
with the time when the line was generated. Instead of running word counts, we 
want to count words within 10 minute windows, updating every 5 minutes. That 
is, word counts in words received between 10 minute windows 12:00 - 12:10, 
12:05 - 12:15, 12:10 - 12:20, etc. Note that 12:00 - 12:10 means data that 
arrived after 12:00 but before 12:10. Now, consider a word that was received at 
12:07. This word should increment the counts corresponding to two windows 12:00 
- 12:10 and 12:05 - 12:15. So the counts will be indexed by both, the grouping 
key (i.e. the word) and the window (can be calculated from the event-time).
 
 The result tables would look something like the following.
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-16256][DOCS] Fix window operation diagram

Reply via email to