Here is one that I have handy where there is XMP on the image...

On 3/17/21, 1:44 PM, "sahy...@fileaffairs.de" <sahy...@fileaffairs.de> wrote:

    Hi Leonard,

    if you could provide a sample document with XMPs attached to various
    PDF objects you're interested in I could come up with a quick sample
    for Tim.

    BR
    Maruan 

    Am Mittwoch, dem 17.03.2021 um 13:39 -0400 schrieb Tim Allison:
    > Hi Leonard,
    >   I'm literally just scraping bytes out of files for now without any
    > parsing...so if the XMP is concealed in a compressed stream or
    > something more interesting, I'm not grabbing it.  I'm also not
    > tracking which XMP is associated with which object.
    >   Please forgive me...if I traverse the COSDocument's objects and
    > look
    > for /Metadata and grab the stream, will that be what you're looking
    > for?  Or, is there a commandline tool I can run to get what you're
    > interested in?
    >   Thank you.
    > 
    >   Cheers,
    > 
    >               Tim
    > 
    > On Wed, Mar 17, 2021 at 1:17 PM Leonard Rosenthol
    > <lrose...@adobe.com.invalid> wrote:
    > > 
    > > Are you only pulling document-level XMP?  If so, could you extend
    > > it to support object-level metadata as well?   I, for one, would
    > > love to get insight into the use of object-level metadata - what
    > > objects are they attached to, what are they being used for, etc.
    > > 
    > > Leonard
    > > 
    > > On 3/17/21, 11:37 AM, "Tim Allison" <talli...@apache.org> wrote:
    > > 
    > >     All,
    > > 
    > >       I'm scraping XMPs out of our corpus and placing them here as
    > > standalone files:
    > > 
    > >    
    > > 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcorpora.tika.apache.org%2Fbase%2Fxmps%2F&amp;data=04%7C01%7Clrosenth%40adobe.com%7C388cecf991ed40022fd808d8e96c4aa6%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637515998615522173%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=2TgR3TTbDedLLOn85E9sVHLePHUqDpzkDnF%2BsnzvIfk%3D&amp;reserved=0
    > > 
    > >       I've binned the files roughly based on the container file's
    > > mime
    > >     type, e.g.
    > > 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcorpora.tika.apache.org%2Fbase%2Fxmps%2Fpdf%2F&amp;data=04%7C01%7Clrosenth%40adobe.com%7C388cecf991ed40022fd808d8e96c4aa6%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637515998615532128%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=vheVHiNdgTtbOIL8plV6vRslcGB0d%2FByGYXtbByH2zk%3D&amp;reserved=0
    > > 
    > >       The process is still running, and I view this as a first
    > > draft.
    > >     Please let me know if there's anything I can do to make these
    > > data
    > >     easier to use/more useful or if you see any problems.
    > > 
    > >       Cheers,
    > > 
    > >                  Tim
    > > 

    -- 
    -- 
    Maruan Sahyoun

    FileAffairs GmbH
    Josef-Schappe-Straße 21
    40882 Ratingen

    Tel: +49 (2102) 89497 88
    Fax: +49 (2102) 89497 91
    sahy...@fileaffairs.de
    
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.fileaffairs.de%2F&amp;data=04%7C01%7Clrosenth%40adobe.com%7C388cecf991ed40022fd808d8e96c4aa6%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637515998615532128%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qcCIbv8VTgWaudXut2FHgOOtJSQTJLDknTSznWdomgw%3D&amp;reserved=0

    Geschäftsführer: Maruan Sahyoun
    Handelsregister: AG Düsseldorf, HRB 53837
    UST.-ID: DE248275827


Reply via email to