Hi Dominik, As you mentioned, it is a pain for each of us to run mime-detection on the files in our corpus to select those we're interested in. This is somewhat out of date, but should be reasonable for now:
http://162.242.228.174/mimes/mime_comparisons.html I'll dump mimes into a tab delimited file ( path\tmime) today and post that here: http://162.242.228.174/metadata/ I think it would also be useful to do subsets: poi, pdf, poi+other_office (msaccess, rtf, odt)... What do you think? Cheers, Tim
