Or if you're extremely ambitious work in implementing Spark Streaming in Python— Sent from Mailbox for iPhone
On Thu, Dec 19, 2013 at 8:30 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote: > Hi Matt, > If you want to get started looking at Spark, I recommend the following > resources: > - Our issue tracker at http://spark-project.atlassian.net contains some > issues marked “Starter” that are good places to jump into. You might be able > to take one of those and extend it into a bigger project. > - The “contributing to Spark” wiki page covers how to send patches and set up > development: > https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark > - This talk has an intro to Spark internals (video and slides are in the > comments): http://www.meetup.com/spark-users/events/94101942/ > For a longer project, here are some possible ones: > - Create a tool that automatically checks which Scala API methods are missing > in Python. We had a similar one for Java that was very useful. Even better > would be to automatically create wrappers for the Scala ones. > - Extend the Spark monitoring UI with profiling information (to sample the > workers and say where they’re spending time, or what data structures consume > the most memory). > - Pick and implement a new machine learning algorithm for MLlib. > Matei > On Dec 17, 2013, at 10:43 AM, Matthew Cheah <mcch...@uwaterloo.ca> wrote: >> Hi everyone, >> >> During my most recent internship, I worked extensively with Apache Spark, >> integrating it into a company's data analytics platform. I've now become >> interested in contributing to Apache Spark. >> >> I'm returning to undergraduate studies in January and there is an academic >> course which is simply a standalone software engineering project. I was >> thinking that some contribution to Apache Spark would satisfy my curiosity, >> help continue support the company I interned at, and give me academic >> credits required to graduate, all at the same time. It seems like too good >> an opportunity to pass up. >> >> With that in mind, I have the following questions: >> >> 1. At this point, is there any self-contained project that I could work >> on within Spark? Ideally, I would work on it independently, in about a >> three month time frame. This time also needs to accommodate ramping up on >> the Spark codebase and adjusting to the Scala programming language and >> paradigms. The company I worked at primarily used the Java APIs. The output >> needs to be a technical report describing the project requirements, and the >> design process I took to engineer the solution for the requirements. In >> particular, it cannot just be a series of haphazard patches. >> 2. How can I get started with contributing to Spark? >> 3. Is there a high-level UML or some other design specification for the >> Spark architecture? >> >> Thanks! I hope to be of some help =) >> >> -Matt Cheah