Hi All, It's been announced that I've passed the midterm evaluations! Beside my mentors Lewis and Talat, I am waiting your comments and suggestions about my project during the second part of GSoC. Thank you all again!
Kind Regards, Furkan KAMACI 1 Tem 2015 10:35 tarihinde "Lewis John Mcgibbney" <[email protected]> yazdı: > This is fantastic. > Needless to say the project will be progressing through mid term. > Your blogging is very positive for dissemination of your work. > Also like to extend a personal thank you to Talat. Excellent job and on > behalf of the community here an exc potent effort to drive this GSOC > project so far only half way through :). > Looking forward to committing the initial patches into master branch and > also your LogManagerSpark which will lower the barrier to adopting the > module. > Thanks > Lewis > > On Wednesday, July 1, 2015, Furkan KAMACI <[email protected]> wrote: > >> Hi, >> >> First of all, I would like to thank all. As you know that I've been >> accepted to GSoC 2015 with my proposal for developing a Spark Backend >> Support for Gora (GORA-386) and it is the time for midterm evaluations. I >> want to share my current progress of project and my midterm proposal as >> well. >> >> During my GSoC period, I've blogged at my personal website ( >> http://furkankamaci.com/) and created a fork from Apache Gora's master >> branch and worked on it: https://github.com/kamaci/gora >> >> At community bonding period, I've read Apache Gora documentation and >> Apache Gora source code to be more familiar >> with project. I've analyzed related projects including Apache Flink and >> Apache Crunch to implement a Spark backend into Apache Gora. I've picked up >> an issue from Jira (https://issues.apache.org/jira/browse/GORA-262) and >> fixed. >> >> At coding period, due to implementing this project needs an >> infrastructure about Apache Spark, I've started with analyzing Spark's >> first papers. I've >> analyzed “Spark: Cluster Computing with Working” ( >> http://www.cs.berkeley.edu/~matei/papers/2010/hotcloud_spark.pdf) and >> “Resilient >> Distributed Datasets: A Fault-Tolerant Abstraction forIn-Memory Cluster >> Computing” >> (https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf). I've >> published two posts about Spark and Cluster Computing >> (http://furkankamaci.com/spark-and-cluster-computing/) and Resilient >> Distributed Datasets ( >> http://furkankamaci.com/resilient-distributed-datasets-rdds/) at my >> personal blog. I've followed Apache Spark documentation and developed >> examples to analyze RDDs. >> >> I've analyzed Apache Gora's GoraInputFormat class and Spark's >> newHadoopRDD method. I've implemented an example application to read data >> from Hbase. >> >> Apache Gora supports reading/writing data from/to Hadoop files. Spark has >> a method for generating an RDD compatible with Hadoop files. So, an >> architecture is designed which creates a bridge between GoraInputFormat and >> RDD due to both of them support Hadoop files. >> >> I've created a base class for Apache Gora and Spark integration named >> as: GoraSparkEngine. It has initialize methods that takes Spark context, >> data store, optional Hadoop configuration and returns an RDD. >> >> After implementing a base for GoraSpark engine, I've developed a new >> example aligned to LogAnalytics named as: >> LogAnalyticsSpark. I've developed map and reduce parts (except for >> writing results into database) which does the same thing as >> LogAnalytics and also something more i.e. printing number of lines in >> tables. >> >> When we get an RDD from GoraSpark engine, we can do the operations over >> it as like making operations on any other RDDs which is not created over >> Apache Gora. Whole code can be checked from code base: >> https://github.com/kamaci/gora >> >> Project progress is ahead from the proposed timeline up to now. >> GoraInputFormat and RDD transformation is done and it is shown that map, >> reduce and other methods can properly work on that kind of RDDs. >> >> Before the next steps, I am planning to design an overall architecture >> according to feedbacks from community (there are some >> prerequisites when designing an architecture: i.e. configuration of a >> context at Spark cannot be changed after context has been initialized). >> >> When necessary functionalities are implemented examples, tests and >> documentations will be done. After that if I have extra time, I'm planning >> to make a performance benchmark of Apache Gora with Hadoop MapReduce, >> Hadoop MapReduce, Apache Spark and Apache Gora with Spark as well. >> >> Special thanks to Lewis and Talat. I should also mention that it is a >> real chance to be able to talk with your mentor face to face. We met with >> Talat many times and he helped me a lot about how Hadoop and Apache Gora >> works. >> >> PS: I've attached my midterm report and my previous reports can be found >> here: >> >> https://cwiki.apache.org/confluence/display/GORA/Spark+Backend+Support+for+Gora+%28GORA-386%29+Reports >> >> Kind Regards, >> Furkan KAMACI >> > > > -- > *Lewis* > >

