[
https://issues.apache.org/jira/browse/SPARK-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699209#comment-14699209
]
Xusen Yin commented on SPARK-9941:
----------------------------------
Hi [~mengxr], I am planning to write something for the crime data in SF.
https://www.kaggle.com/c/sf-crime
It looks simple and interesting. I'll try to keep all the operators inside the
ML package. Pls create a subtask to me.
> Try ML pipeline API on Kaggle competitions
> ------------------------------------------
>
> Key: SPARK-9941
> URL: https://issues.apache.org/jira/browse/SPARK-9941
> Project: Spark
> Issue Type: Umbrella
> Components: ML
> Reporter: Xiangrui Meng
> Assignee: Xiangrui Meng
>
> This is an umbrella JIRA to track some fun tasks :)
> We have built many features under the ML pipeline API, and we want to see how
> it works on real-world datasets, e.g., Kaggle competition datasets
> (https://www.kaggle.com/competitions). We want to invite community members to
> help test. The goal is NOT to win the competitions but to provide code
> examples and to find out missing features and other issues to help shape the
> roadmap.
> For people who are interested, please do the following:
> 1. Create a subtask (or leave a comment if you cannot create a subtask) to
> claim a Kaggle dataset.
> 2. Use the ML pipeline API to build and tune an ML pipeline that works for
> the Kaggle dataset.
> 3. Paste the code to gist (https://gist.github.com/) and provide the link
> here.
> 4. Report missing features, issues, running times, and accuracy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]