All, I recently extracted metadata keys from 1 million files in our regression corpus and did a group by. This allows insight into common metadata keys.
I've included two views, one looks at overall counts, and the other breaks down metadata keys by mime type. Please let us know if you find anything interesting or have any questions. https://corpora.tika.apache.org/base/share/metadata-keys-overall-1m.txt.gz https://corpora.tika.apache.org/base/share/metadata-keys-by-mime-1m.txt.gz Best, Tim
