Or if you're extremely ambitious work in implementing Spark Streaming in Python—
Sent from Mailbox for iPhone

On Thu, Dec 19, 2013 at 8:30 PM, Matei Zaharia <matei.zaha...@gmail.com>
wrote:

> Hi Matt,
> If you want to get started looking at Spark, I recommend the following 
> resources:
> - Our issue tracker at http://spark-project.atlassian.net contains some 
> issues marked “Starter” that are good places to jump into. You might be able 
> to take one of those and extend it into a bigger project.
> - The “contributing to Spark” wiki page covers how to send patches and set up 
> development: 
> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark 
> - This talk has an intro to Spark internals (video and slides are in the 
> comments): http://www.meetup.com/spark-users/events/94101942/
> For a longer project, here are some possible ones:
> - Create a tool that automatically checks which Scala API methods are missing 
> in Python. We had a similar one for Java that was very useful. Even better 
> would be to automatically create wrappers for the Scala ones.
> - Extend the Spark monitoring UI with profiling information (to sample the 
> workers and say where they’re spending time, or what data structures consume 
> the most memory).
> - Pick and implement a new machine learning algorithm for MLlib.
> Matei
> On Dec 17, 2013, at 10:43 AM, Matthew Cheah <mcch...@uwaterloo.ca> wrote:
>> Hi everyone,
>> 
>> During my most recent internship, I worked extensively with Apache Spark,
>> integrating it into a company's data analytics platform. I've now become
>> interested in contributing to Apache Spark.
>> 
>> I'm returning to undergraduate studies in January and there is an academic
>> course which is simply a standalone software engineering project. I was
>> thinking that some contribution to Apache Spark would satisfy my curiosity,
>> help continue support the company I interned at, and give me academic
>> credits required to graduate, all at the same time. It seems like too good
>> an opportunity to pass up.
>> 
>> With that in mind, I have the following questions:
>> 
>>   1. At this point, is there any self-contained project that I could work
>>   on within Spark? Ideally, I would work on it independently, in about a
>>   three month time frame. This time also needs to accommodate ramping up on
>>   the Spark codebase and adjusting to the Scala programming language and
>>   paradigms. The company I worked at primarily used the Java APIs. The output
>>   needs to be a technical report describing the project requirements, and the
>>   design process I took to engineer the solution for the requirements. In
>>   particular, it cannot just be a series of haphazard patches.
>>   2. How can I get started with contributing to Spark?
>>   3. Is there a high-level UML or some other design specification for the
>>   Spark architecture?
>> 
>> Thanks! I hope to be of some help =)
>> 
>> -Matt Cheah

Reply via email to