Hi Leonard,

if you could provide a sample document with XMPs attached to various
PDF objects you're interested in I could come up with a quick sample
for Tim.

BR
Maruan 

Am Mittwoch, dem 17.03.2021 um 13:39 -0400 schrieb Tim Allison:
> Hi Leonard,
>   I'm literally just scraping bytes out of files for now without any
> parsing...so if the XMP is concealed in a compressed stream or
> something more interesting, I'm not grabbing it.  I'm also not
> tracking which XMP is associated with which object.
>   Please forgive me...if I traverse the COSDocument's objects and
> look
> for /Metadata and grab the stream, will that be what you're looking
> for?  Or, is there a commandline tool I can run to get what you're
> interested in?
>   Thank you.
> 
>   Cheers,
> 
>               Tim
> 
> On Wed, Mar 17, 2021 at 1:17 PM Leonard Rosenthol
> <lrose...@adobe.com.invalid> wrote:
> > 
> > Are you only pulling document-level XMP?  If so, could you extend
> > it to support object-level metadata as well?   I, for one, would
> > love to get insight into the use of object-level metadata - what
> > objects are they attached to, what are they being used for, etc.
> > 
> > Leonard
> > 
> > On 3/17/21, 11:37 AM, "Tim Allison" <talli...@apache.org> wrote:
> > 
> >     All,
> > 
> >       I'm scraping XMPs out of our corpus and placing them here as
> > standalone files:
> > 
> >    
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcorpora.tika.apache.org%2Fbase%2Fxmps%2F&amp;data=04%7C01%7Clrosenth%40adobe.com%7C40651db6e9fa4260de9108d8e95a9b01%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637515922640651454%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ujb11etR6nqAqqxo7l1SHMiDrU5KxYPRXTm4nvXrCXo%3D&amp;reserved=0
> > 
> >       I've binned the files roughly based on the container file's
> > mime
> >     type, e.g.
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcorpora.tika.apache.org%2Fbase%2Fxmps%2Fpdf%2F&amp;data=04%7C01%7Clrosenth%40adobe.com%7C40651db6e9fa4260de9108d8e95a9b01%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637515922640651454%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=HFcAVr0CLvIwEa5%2BsD8iYRSDgm6LWHNcXfzsPnSEDqs%3D&amp;reserved=0
> > 
> >       The process is still running, and I view this as a first
> > draft.
> >     Please let me know if there's anything I can do to make these
> > data
> >     easier to use/more useful or if you see any problems.
> > 
> >       Cheers,
> > 
> >                  Tim
> > 

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827

Reply via email to