> This year, the Billion Triple Challenge data set consists of 2 billion > triples. The dataset was crawled during May/June 2011 using a random sample > of URIs from the BTC 2010 dataset as seed URIs. Lots of thanks to Andreas > Harth for all his effort put into crawling the web to compile this dataset, > and to the Karlsruher Institut für Technologie which provided the necessary > hardware for this labour-intensive task.**** > > ** >
On a related note, while nothing can beat a custom job obviously, i feel like reminding that those that don't have said mighty time/money/resources that any amount of data that one wants rom the repositories in Sindice which we do make freely available for things like this. (0 to 20++ billion triples, LOD or non LOD, microformats, RDFa, custom filtered etc) See the TREC 2011 competition http://data.sindice.com/trec2011/download.html (1TB+ of data!) or the recent W3C data anaysis which is leading to a new reccomendation ( http://www.w3.org/2010/02/rdfa/profile/data/) etc. trying to help. Congrats on the great job guys of course for the Semantic web challenge which is a long standing great initiative! Gio