Re: XMPs...all you could possibly want...and more!

Leonard Rosenthol Wed, 17 Mar 2021 10:17:34 -0700

Are you only pulling document-level XMP?  If so, could you extend it to support 
object-level metadata as well?   I, for one, would love to get insight into the 
use of object-level metadata - what objects are they attached to, what are they 
being used for, etc.


Leonard

On 3/17/21, 11:37 AM, "Tim Allison" <talli...@apache.org> wrote:

    All,

      I'm scraping XMPs out of our corpus and placing them here as standalone 
files:

    
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcorpora.tika.apache.org%2Fbase%2Fxmps%2F&amp;data=04%7C01%7Clrosenth%40adobe.com%7C40651db6e9fa4260de9108d8e95a9b01%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637515922640651454%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ujb11etR6nqAqqxo7l1SHMiDrU5KxYPRXTm4nvXrCXo%3D&amp;reserved=0

      I've binned the files roughly based on the container file's mime
    type, e.g. 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcorpora.tika.apache.org%2Fbase%2Fxmps%2Fpdf%2F&amp;data=04%7C01%7Clrosenth%40adobe.com%7C40651db6e9fa4260de9108d8e95a9b01%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637515922640651454%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=HFcAVr0CLvIwEa5%2BsD8iYRSDgm6LWHNcXfzsPnSEDqs%3D&amp;reserved=0

      The process is still running, and I view this as a first draft.
    Please let me know if there's anything I can do to make these data
    easier to use/more useful or if you see any problems.

      Cheers,

                 Tim

Re: XMPs...all you could possibly want...and more!

Reply via email to