Hi Spark Devs An idea developed recently out of a scikit-learn mailing list discussion ( http://sourceforge.net/mailarchive/forum.php?thread_name=CAFvE7K5HGKYH9Myp7imrJ-nU%3DpJgeGqcCn3JC0m4MmGWZi35Hw%40mail.gmail.com&forum_name=scikit-learn-general) to have a coding sprint around Strata in Feb, focused on integration between scikit-learn and PySpark for large-scale machine learning tasks.
Cloudera has kindly agreed to host the sprint, most likely in San Francisco. Ideally it would be focused and capped at around 10 people. The idea is not meant to be a teaching workshop for newcomers but more as a prototyping session, so ideally it would be great to have developers and users with deep knowledge of PySpark (Josh especially :) and/or scikit-learn, attend. Hopefully we can get some people from the Spark community involved, and Olivier will drum up support from the scikit-learn community. All the best and hope to see you there (though likely I will only be able to join remotely). Nick