Got it. Thank you. I wanted to confirm that nothing had changed since last
summer (PDFBOX-2855).
Are you taking bug reports for jempbox or is that entirely eol'd?
Any recommendations for a somewhat lenient, Apache license-compatible XMP
parser?
Might it make sense to include in the README or in the package javadocs
something about the goals for XmpBox? It is entirely possible that I missed
the warning. ;)
Thank you, again.
Best,
Tim
-----Original Message-----
From: Tilman Hausherr [mailto:[email protected]]
Sent: Tuesday, March 08, 2016 12:13 PM
To: [email protected]
Subject: Re: roadmap for XMPBox?
I think the problem is that XmpBox was written for PDF/A checking, so it fails
with XMPs that are not PDF/A. For example, file 000142.pdf has the schema
http://ns.adobe.com/pdfx/1.3/ which is not allowed for PDF/A:
http://www.pdfa.org/wp-content/uploads/2011/08/tn0008_predefined_xmp_properties_in_pdfa-1_2008-03-20.pdf
And no, there are no plans for anything on XMP at this time...
Tilman
Am 07.03.2016 um 19:31 schrieb Allison, Timothy B.:
> All,
>
>
>
> When we migrate to PDFBox 2.x over on Tika, I'd much prefer to switch
> from our current reliance on jempbox to XMPBox. I recently extracted ~70k
> XMPs from PDFs with PDFBox 2.0.0-SNAPSHOT, and when I ran XMPBox's parser,
> there were exceptions on roughly 40% of the XMPs.
>
>
>
> I’m including a table below of the counts of exception messages. Are
> there any plans to make XMPBox more lenient or is this what we can expect
> going forward?
>
>
>
> As always, I’m more than happy to help with files and tests. Let me know
> what I can do.
>
>
>
> Cheers,
>
>
>
> Tim
>
>
>
> No XmpParsingException on 42,022 files.
>
>
>
>
>
>
>
> Exceptions:
>
>
> Cannot find a definition for the namespace
> http://ns.adobe.com/pdfx/1.3/
>
> 13403
>
> Type 'originalDocumentID' not defined in
> http://ns.adobe.com/xap/1.0/sType/ResourceRef#
>
> 3710
>
> Missing pdfaSchema:property in type definition
>
> 3113
>
> Expecting namespace 'adobe:ns:meta/' and found
> 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
>
> 2867
>
> Invalid array type, expecting Seq and found Bag [prefix=dc;
> name=creator]
>
> 927
>
> Invalid array type, expecting Alt and found Seq [prefix=dc;
> name=description]
>
> 723
>
> Cannot find a definition for the namespace
> http://ns.adobe.com/xmp/InDesign/private
>
> 710
>
> Invalid array type, expecting Bag and found Seq [prefix=dc;
> name=subject]
>
> 654
>
> Cannot find a definition for the namespace
> http://ns.adobe.com/AcrobatAdhocWorkflow/1.0/
>
> 522
>
> Failed to parse
>
> 492
>
> Invalid array definition, expecting Seq and found
> com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc;
> name=date]
>
> 370
>
> Cannot find a definition for the namespace
> http://ns.adobe.com/illustrator/1.0/
>
> 262
>
> Cannot find a definition for the namespace
> http://ns.adobe.com/xfa/promoted-desc/
>
> 188
>
> Failed to instanciate property in xmp:CreateDate
>
> 144
>
> Schema is not set in this document :
> http://www.w3.org/1999/02/22-rdf-syntax-ns#
>
> 125
>
> Expecting local name 'xmpmeta' and found 'xapmeta'
>
> 94
>
> Cannot find a definition for the namespace
> http://www.rwjf.org/rwjf/1.0
>
> 84
>
> Failed to instanciate property in xap:CreateDate
>
> 74
>
> Invalid array definition, expecting Bag and found
> com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc;
> name=language]
>
> 68
>
> Invalid array definition, expecting Alt and found
> com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc;
> name=title]
>
> 49
>
> Cannot find a definition for the namespace http://www.sap.com
>
> 46
>
> Failed to instanciate property in exif:ColorSpace
>
> 33
>
> Failed to instanciate property in xmpMM:History
>
> 28
>
> xmp should start with a processing instruction
>
> 26
>
> Cannot find a definition for the namespace
> http://prismstandard.org/namespaces/basic/2.0/
>
> 24
>
> Cannot find a definition for the namespace
> http://www.npes.org/pdfx/ns/id/
>
> 21
>
> Cannot find a definition for the namespace
> http://ns.InsiderSoftware.com/fontlist/1.0/
>
> 14
>
> Invalid array definition, expecting Seq and found
> com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc;
> name=creator]
>
> 14
>
> Failed to instanciate property in xmp:MetadataDate
>
> 12
>
> Cannot find a definition for the namespace
> http://ns.xinet.com/webnative/private/1.0/
>
> 10
>
> Failed to instanciate property in xap:ModifyDate
>
> 10
>
> Failed to instanciate property in xmp:ModifyDate
>
> 10
>
> Type 'params' not defined in
> http://ns.adobe.com/xap/1.0/sType/ResourceEvent#
>
> 9
>
> Invalid array type, expecting Seq and found Bag [prefix=xmpMM;
> name=History]
>
> 8
>
> Type 'documentName' not defined in
> http://ns.adobe.com/xap/1.0/sType/ResourceRef#
>
> 8
>
> Cannot find a definition for the namespace http://www.day.com/dam/1.0
>
> 7
>
> Cannot find a definition for the namespace ptc
>
> 7
>
> Failed to instanciate property in xapMM:History
>
> 6
>
> Invalid array definition, expecting Seq and found
> com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=tiff;
> name=YCbCrPositioning]
>
> 5
>
> Schema is not set in this document : http://purl.org/dc/elements/1.1/
>
> 5
>
> Cannot find a definition for the namespace
> http://www.extensis.com/meta/FontSense/
>
> 4
>
> Excepted xpacket 'end' attribute (must be present and placed in first)
>
> 4
>
> Invalid array type, expecting Seq and found Bag [prefix=photoshop;
> name=TextLayers]
>
> 3
>
> Schema is not set in this document : http://ns.adobe.com/xap/1.0/
>
> 3
>
> no message (NPE)
>
> 2
>
> Cannot find a definition for the namespace
> http://laserfiche.com/xmp/schema/1.0/
>
> 2
>
> Cannot find a definition for the namespace
> http://ns.adobe.com/AdobeFormsCentralWorkflow/1.0/
>
> 2
>
> Cannot find a definition for the namespace
> http://ns.adobe.com/camera-raw-settings/1.0/
>
> 2
>
> Failed to instanciate property in xapRights:Marked
>
> 2
>
> Invalid array type, expecting Alt and found Bag [prefix=dc;
> name=title]
>
> 2
>
> Invalid array type, expecting Alt and found Seq [prefix=dc;
> name=title]
>
> 2
>
> Invalid array type, expecting Seq and found Alt [prefix=dc;
> name=creator]
>
> 2
>
> Cannot find a definition for the namespace
> http://ns.cambridgeassociates.com/status/1.0/
>
> 1
>
> Cannot find a definition for the namespace
> http://ns.computershare.com.au/ccs/1.0/
>
> 1
>
> Cannot find a definition for the namespace
> http://ns.esko-graphics.com/grinfo/1.0/
>
> 1
>
> Cannot find a definition for the namespace
> http://ns.tripletriangle.com/ns/tripletri/
>
> 1
>
> Cannot find a definition for the namespace
> http://prismstandard.org/namespaces/basic/2.1/
>
> 1
>
> Cannot find a definition for the namespace
> http://www.aiim.org/pdfa/ns/id.html
>
> 1
>
> Cannot find a definition for the namespace
> http://www.aiim.org/pdfe/ns/id/
>
> 1
>
> Cannot find a definition for the namespace
> http://www.enfocus.com/ns/CertifiedPDF/2.0/
>
> 1
>
> Cannot find a definition for the namespace
> http://www.northplains.com/xmpnps/cov/1.0/
>
> 1
>
> Failed to instanciate property in xmpRights:Marked
>
> 1
>
> Invalid array type, expecting Seq and found Bag [prefix=dc; name=date]
>
> 1
>
> This namespace is not a schema or a structured type :
> http://ns.adobe.com/xap/1.0/sType/Job#
>
> 1
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected] For additional
commands, e-mail: [email protected]