Re: Sample datasets for MLlib and Graphx

2014-07-03 Thread Nick Pentreath
Take a look at Kaggle competition datasets - https://www.kaggle.com/competitions For svm there are a couple of ad click prediction datasets of pretty large size. For graph stuff the SNAP has large network data: https://snap.stanford.edu/data/ — Sent from Mailbox On Thu, Jul 3, 2014 at

Re: Sample datasets for MLlib and Graphx

2014-07-03 Thread AlexanderRiggers
Nick Pentreath wrote Take a look at Kaggle competition datasets - https://www.kaggle.com/competitions I was looking for files in LIBSVM format and never found something on Kaggle in bigger size. Most competitions I ve seen need data processing and feature generating, but maybe I ve to take a

Re: Sample datasets for MLlib and Graphx

2014-07-03 Thread Nick Pentreath
The Kaggle data is not in libsvm format so you'd have to do some transformation. The Criteo and KDD cup datasets are if I recall fairly large. Criteo ad prediction data is around 2-3GB compressed I think. To my knowledge these are the largest binary classification datasets I've come across