[ 
https://issues.apache.org/jira/browse/SPARK-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699209#comment-14699209
 ] 

Xusen Yin commented on SPARK-9941:
----------------------------------

Hi [~mengxr], I am planning to write something for the crime data in SF. 
https://www.kaggle.com/c/sf-crime

It looks simple and interesting. I'll try to keep all the operators inside the 
ML package. Pls create a subtask to me.

> Try ML pipeline API on Kaggle competitions
> ------------------------------------------
>
>                 Key: SPARK-9941
>                 URL: https://issues.apache.org/jira/browse/SPARK-9941
>             Project: Spark
>          Issue Type: Umbrella
>          Components: ML
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>
> This is an umbrella JIRA to track some fun tasks :)
> We have built many features under the ML pipeline API, and we want to see how 
> it works on real-world datasets, e.g., Kaggle competition datasets 
> (https://www.kaggle.com/competitions). We want to invite community members to 
> help test. The goal is NOT to win the competitions but to provide code 
> examples and to find out missing features and other issues to help shape the 
> roadmap.
> For people who are interested, please do the following:
> 1. Create a subtask (or leave a comment if you cannot create a subtask) to 
> claim a Kaggle dataset.
> 2. Use the ML pipeline API to build and tune an ML pipeline that works for 
> the Kaggle dataset.
> 3. Paste the code to gist (https://gist.github.com/) and provide the link 
> here.
> 4. Report missing features, issues, running times, and accuracy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to