I have been thinking about creating a private s3 bucket, but this would render it impossible to run the tests locally. On the other hand, the licenses of many datasets like Movielens forbid redistribution, means setting the s3 bucket to public is not allowed. We could think about a hybrid solution which tries to query the s3 bucket and downloads the file from an alternative address (aka the original source) if the s3 bucket is not reachable.
On Sun, Jan 7, 2018 at 12:29 AM, Marco de Abreu < [email protected]> wrote: > I could offer to download the dataset and create an S3 bucket to store all > used datasets. This would also reduce external dependencies. > > Wdyt? > > -Marco > > Am 07.01.2018 12:26 vorm. schrieb "kellen sunderland" < > [email protected]>: > >> FYI PRs are currently failing to build. The R "Matrix Factorization" test >> is failing to download this dataset: http://files.grouplens.org/datasets/ >> movielens/ml-100k.zip >> <http://files.grouplens.org/datasets/movielens/ml-100k.zip> . The site >> https://grouplens.org/ appears to be down. >> >> Issue here: https://github.com/apache/incubator-mxnet/issues/9332 >> PR to skip the test here: >> https://github.com/apache/incubator-mxnet/pull/9333 >> >> -Kellen >> >
