Hi Leonard,
  I'm literally just scraping bytes out of files for now without any
parsing...so if the XMP is concealed in a compressed stream or
something more interesting, I'm not grabbing it.  I'm also not
tracking which XMP is associated with which object.
  Please forgive me...if I traverse the COSDocument's objects and look
for /Metadata and grab the stream, will that be what you're looking
for?  Or, is there a commandline tool I can run to get what you're
interested in?
  Thank you.

  Cheers,

              Tim

On Wed, Mar 17, 2021 at 1:17 PM Leonard Rosenthol
<lrose...@adobe.com.invalid> wrote:
>
> Are you only pulling document-level XMP?  If so, could you extend it to 
> support object-level metadata as well?   I, for one, would love to get 
> insight into the use of object-level metadata - what objects are they 
> attached to, what are they being used for, etc.
>
> Leonard
>
> On 3/17/21, 11:37 AM, "Tim Allison" <talli...@apache.org> wrote:
>
>     All,
>
>       I'm scraping XMPs out of our corpus and placing them here as standalone 
> files:
>
>     
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcorpora.tika.apache.org%2Fbase%2Fxmps%2F&amp;data=04%7C01%7Clrosenth%40adobe.com%7C40651db6e9fa4260de9108d8e95a9b01%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637515922640651454%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ujb11etR6nqAqqxo7l1SHMiDrU5KxYPRXTm4nvXrCXo%3D&amp;reserved=0
>
>       I've binned the files roughly based on the container file's mime
>     type, e.g. 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcorpora.tika.apache.org%2Fbase%2Fxmps%2Fpdf%2F&amp;data=04%7C01%7Clrosenth%40adobe.com%7C40651db6e9fa4260de9108d8e95a9b01%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637515922640651454%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=HFcAVr0CLvIwEa5%2BsD8iYRSDgm6LWHNcXfzsPnSEDqs%3D&amp;reserved=0
>
>       The process is still running, and I view this as a first draft.
>     Please let me know if there's anything I can do to make these data
>     easier to use/more useful or if you see any problems.
>
>       Cheers,
>
>                  Tim
>

Reply via email to