Cool! The way I'd start is perhaps by adding a new Python example job. For example, a good one to implement would be PageRank -- you can look at these slides for a Scala version of it: http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-part-2-amp-camp-2012-standalone-programs.pdf. Another possibility is linear regression. But feel free to also come up with your own.
There are also a number of Python issues open relating to adding some missing API features, but these require a more thorough understanding of how PySpark work and possibly some hacking around in pickled data: https://spark-project.atlassian.net/browse/SPARK-791?jql=component%20%3D%20PySpark%20AND%20status%20%3D%20Open . The easiest one to start with is probably SPARK-838. Matei On Jul 30, 2013, at 6:44 AM, Michael Joyce <[email protected]> wrote: > Hay Matei, > > I would love to help on the Python API. I'll start taking a look at that. > Unfortunately I don't have access to a Windows computer, so I can't be of > much use there. I would also be more than happy to work on the JVM stuff as > well. If you have a list stuff to do there (or it wouldn't take too long to > compile one), I would gladly take a look. > > Thanks for all the help! > > > -- Joyce > > > On Mon, Jul 29, 2013 at 4:17 PM, Matei Zaharia <[email protected]>wrote: > >> Hey Michael, >> >> Depending on your background, there are quite a few things to do. >> >> One general area that we might use more help for, if you have experience >> there, is the Python API. Part of it can be just to add more examples in >> Python, e.g., to show how one can use NumPy or SciPy with it. Another thing >> that would be super useful if you also have access to Windows is this: >> https://spark-project.atlassian.net/browse/SPARK-649. We want to make >> Spark very broadly accessible for science work and it sounds like your >> background at JPL is good for that. >> >> Alternatively, if you prefer to work on the Java VM, there are a bunch of >> internal things to do there too -- I can give an overview of what I'd >> consider easy to jump into there. >> >> Matei >> >> On Jul 29, 2013, at 1:03 PM, Michael Joyce <[email protected]> wrote: >> >>> Hay Matei, >>> >>> Truth be told I haven't had much of a chance to look through JIRA and the >>> code base to pick a specific part to work on. Is there anything in >>> particular that needs some work? I'm more than happy to throw some effort >>> at a specific problem if something needs attention. Otherwise I can just >>> poke around and try to find a nice niche in which to work so I can help >> out. >>> >>> Thanks much! >>> >>> -- Joyce >>> >>> >>> On Mon, Jul 29, 2013 at 10:55 AM, Matei Zaharia <[email protected] >>> wrote: >>> >>>> Hey Michael, >>>> >>>> Glad to hear you're interested in helping. Are there specific things >> you'd >>>> like to work on? Certainly we will need help with various Apache >> packaging, >>>> etc so it's good to have more people with experience at Apache. >>>> >>>> Matei >>>> >>>> On Jul 29, 2013, at 8:36 AM, Michael Joyce <[email protected]> wrote: >>>> >>>>> Hi all! >>>>> >>>>> My name is Michael Joyce. I work at JPL and have heard some great >> things >>>>> about Spark from Chris Mattmann. I figured I would stop by, say hello, >>>> and >>>>> hopefully throw some helpful contributions at the project. >>>>> >>>>> Look forward to helping out! >>>>> >>>>> -- Joyce >>>> >>>> >> >>
