Hi, I've organized GSoC pages at Gora wiki and I've created a template for reports [1] after I've checked previous reports of Nutch and Gora.
For my GSoC period, I've started with paper at which Spark is introduced [2] and than finished RDD paper [3]. I've also started to read Spark's documentation. I'm planning to continue with Dryad's [4] and YARN's papers [5]. After these, my aim is to get a comprehensive knowledge about why Spark is introduced, its relationship between Hadoop, what are its differences from other related frameworks and how it is implemented from a view of architectural perspective. Next step will be diving into Gora (including picking up any issues to solve it - by the way you are welcome to suggest any issue!) and implementing a piece of code that transforms GoraInputFormat to Spark's RDD. What do you suggest me for next steps (everybody can comment on this, not just my mentors)? On the other hand, Lewis and Talat, when do you want me to start weekly reporting process? Kind Regards, Furkan KAMACI PS: I'm blogging during my GSoC process at my personal blog [6]. [1] https://cwiki.apache.org/confluence/display/GORA/Google+Summer+of+Code [2] http://www.cs.berkeley.edu/~matei/papers/2010/hotcloud_spark.pdf [3] https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf [4] http://research.microsoft.com/pubs/63785/eurosys07.pdf [5] http://dl.acm.org/citation.cfm?id=2523633 [6] http://furkankamaci.com/

