All,

  I'm scraping XMPs out of our corpus and placing them here as standalone files:

https://corpora.tika.apache.org/base/xmps/

  I've binned the files roughly based on the container file's mime
type, e.g. https://corpora.tika.apache.org/base/xmps/pdf/

  The process is still running, and I view this as a first draft.
Please let me know if there's anything I can do to make these data
easier to use/more useful or if you see any problems.

  Cheers,

             Tim

Reply via email to