[dropping Beam on this] Tim, another thing is that you can finally download the TREC-DD Polar data either from the NSF Arctic Data Center (70GB zip), or from Amazon S3, as described here:
http://github.com/chrismattmann/trec-dd-polar/ In case we want to use as part of our regression. Cheers, Chris On 9/22/17, 10:43 AM, "Allison, Timothy B." <[email protected]> wrote: >>1) We've gathered a TB of data from CommonCrawl and we run regression tests against this TB (thank you, Rackspace for hosting our vm!) to try to identify these problems. And if anyone with connections at a big company doing open source + cloud would be interested in floating us some storage and cycles, we'd be happy to move off our single vm to increase coverage and improve the speed for our large-scale regression tests. :D But seriously, thank you for this discussion and collaboration! Cheers, Tim
