Hi, We have released our dataset collected from 2015-16 in the Polar Domain, called the TREC Dynamic Domain Polar dataset.
Researchers interested in a rich dataset collected across the Scientific and Deep web can use mine HTML pages, PDF files, images, video, audio, and other formats for scientific insights. The data is described here: https://github.com/chrismattmann/trec-dd-polar And available from the NSF Arctic Data Center here: https://arcticdata.io/catalog/#view/doi:10.18739/A2280J If you use the dataset in your work, please consider citing it: @inproceedings{burgess2015trec, title={TREC Dynamic Domain: Polar Science.}, author={Burgess, Annie Bryant and Mattmann, Chris and Totaro, Giuseppe and McGibbney, Lewis John and Ramirez, Paul M}, booktitle={TREC}, year={2015} } (our TREC paper, and/or the DOI from the actual dataset). Enjoy! Cheers, Chris Mattmann ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Principal Data Scientist, Engineering Administrative Office (3010) Manager, NSF & Open Source Projects Formulation and Development Offices (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 180-503E, Mailstop: 180-503 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++