Repository: spark-website Updated Branches: refs/heads/asf-site a78faf582 -> eee58685c
replace with valid url to rdd paper Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/eee58685 Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/eee58685 Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/eee58685 Branch: refs/heads/asf-site Commit: eee58685c39269c191a921c39f1520c747a42318 Parents: a78faf5 Author: Xin Ren <iamsh...@126.com> Authored: Fri Sep 16 16:31:23 2016 -0700 Committer: Xin Ren <iamsh...@126.com> Committed: Fri Sep 16 16:31:23 2016 -0700 ---------------------------------------------------------------------- research.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark-website/blob/eee58685/research.md ---------------------------------------------------------------------- diff --git a/research.md b/research.md index 41841a1..ec7dd54 100644 --- a/research.md +++ b/research.md @@ -27,7 +27,7 @@ Traditional MapReduce and DAG engines are suboptimal for these applications beca </p> <p> -Spark offers an abstraction called <a href="http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf"><em>resilient distributed datasets (RDDs)</em></a> to support these applications efficiently. RDDs can be stored in memory between queries <em>without</em> requiring replication. Instead, they rebuild lost data on failure using <em>lineage</em>: each RDD remembers how it was built from other datasets (by transformations like <code>map</code>, <code>join</code> or <code>groupBy</code>) to rebuild itself. RDDs allow Spark to outperform existing models by up to 100x in multi-pass analytics. We showed that RDDs can support a wide variety of iterative algorithms, as well as interactive data mining and a highly efficient SQL engine (<a href="http://shark.cs.berkeley.edu">Shark</a>). +Spark offers an abstraction called <a href="http://people.csail.mit.edu/matei/papers/2012/nsdi_spark.pdf"><em>resilient distributed datasets (RDDs)</em></a> to support these applications efficiently. RDDs can be stored in memory between queries <em>without</em> requiring replication. Instead, they rebuild lost data on failure using <em>lineage</em>: each RDD remembers how it was built from other datasets (by transformations like <code>map</code>, <code>join</code> or <code>groupBy</code>) to rebuild itself. RDDs allow Spark to outperform existing models by up to 100x in multi-pass analytics. We showed that RDDs can support a wide variety of iterative algorithms, as well as interactive data mining and a highly efficient SQL engine (<a href="http://shark.cs.berkeley.edu">Shark</a>). </p> <p class="noskip">You can find more about the research behind Spark in the following papers:</p> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org