All,

  I recently extracted metadata keys from 1 million files in our
regression corpus and did a group by.  This allows insight into common
metadata keys.

  I've included two views, one looks at overall counts, and the other
breaks down metadata keys by mime type.

  Please let us know if you find anything interesting or have any questions.

https://corpora.tika.apache.org/base/share/metadata-keys-overall-1m.txt.gz
https://corpora.tika.apache.org/base/share/metadata-keys-by-mime-1m.txt.gz

   Best,

            Tim

Reply via email to