I think the problem is that XmpBox was written for PDF/A checking, so it
fails with XMPs that are not PDF/A. For example, file 000142.pdf has the
schema http://ns.adobe.com/pdfx/1.3/ which is not allowed for PDF/A:
http://www.pdfa.org/wp-content/uploads/2011/08/tn0008_predefined_xmp_properties_in_pdfa-1_2008-03-20.pdf
And no, there are no plans for anything on XMP at this time...
Tilman
Am 07.03.2016 um 19:31 schrieb Allison, Timothy B.:
All,
When we migrate to PDFBox 2.x over on Tika, I'd much prefer to switch from
our current reliance on jempbox to XMPBox. I recently extracted ~70k XMPs from
PDFs with PDFBox 2.0.0-SNAPSHOT, and when I ran XMPBox's parser, there were
exceptions on roughly 40% of the XMPs.
I’m including a table below of the counts of exception messages. Are there
any plans to make XMPBox more lenient or is this what we can expect going
forward?
As always, I’m more than happy to help with files and tests. Let me know
what I can do.
Cheers,
Tim
No XmpParsingException on 42,022 files.
Exceptions:
Cannot find a definition for the namespace http://ns.adobe.com/pdfx/1.3/
13403
Type 'originalDocumentID' not defined in
http://ns.adobe.com/xap/1.0/sType/ResourceRef#
3710
Missing pdfaSchema:property in type definition
3113
Expecting namespace 'adobe:ns:meta/' and found
'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
2867
Invalid array type, expecting Seq and found Bag [prefix=dc; name=creator]
927
Invalid array type, expecting Alt and found Seq [prefix=dc; name=description]
723
Cannot find a definition for the namespace
http://ns.adobe.com/xmp/InDesign/private
710
Invalid array type, expecting Bag and found Seq [prefix=dc; name=subject]
654
Cannot find a definition for the namespace
http://ns.adobe.com/AcrobatAdhocWorkflow/1.0/
522
Failed to parse
492
Invalid array definition, expecting Seq and found
com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc; name=date]
370
Cannot find a definition for the namespace http://ns.adobe.com/illustrator/1.0/
262
Cannot find a definition for the namespace
http://ns.adobe.com/xfa/promoted-desc/
188
Failed to instanciate property in xmp:CreateDate
144
Schema is not set in this document : http://www.w3.org/1999/02/22-rdf-syntax-ns#
125
Expecting local name 'xmpmeta' and found 'xapmeta'
94
Cannot find a definition for the namespace http://www.rwjf.org/rwjf/1.0
84
Failed to instanciate property in xap:CreateDate
74
Invalid array definition, expecting Bag and found
com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc;
name=language]
68
Invalid array definition, expecting Alt and found
com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc; name=title]
49
Cannot find a definition for the namespace http://www.sap.com
46
Failed to instanciate property in exif:ColorSpace
33
Failed to instanciate property in xmpMM:History
28
xmp should start with a processing instruction
26
Cannot find a definition for the namespace
http://prismstandard.org/namespaces/basic/2.0/
24
Cannot find a definition for the namespace http://www.npes.org/pdfx/ns/id/
21
Cannot find a definition for the namespace
http://ns.InsiderSoftware.com/fontlist/1.0/
14
Invalid array definition, expecting Seq and found
com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=dc;
name=creator]
14
Failed to instanciate property in xmp:MetadataDate
12
Cannot find a definition for the namespace
http://ns.xinet.com/webnative/private/1.0/
10
Failed to instanciate property in xap:ModifyDate
10
Failed to instanciate property in xmp:ModifyDate
10
Type 'params' not defined in http://ns.adobe.com/xap/1.0/sType/ResourceEvent#
9
Invalid array type, expecting Seq and found Bag [prefix=xmpMM; name=History]
8
Type 'documentName' not defined in
http://ns.adobe.com/xap/1.0/sType/ResourceRef#
8
Cannot find a definition for the namespace http://www.day.com/dam/1.0
7
Cannot find a definition for the namespace ptc
7
Failed to instanciate property in xapMM:History
6
Invalid array definition, expecting Seq and found
com.sun.org.apache.xerces.internal.dom.DeferredTextImpl [prefix=tiff;
name=YCbCrPositioning]
5
Schema is not set in this document : http://purl.org/dc/elements/1.1/
5
Cannot find a definition for the namespace
http://www.extensis.com/meta/FontSense/
4
Excepted xpacket 'end' attribute (must be present and placed in first)
4
Invalid array type, expecting Seq and found Bag [prefix=photoshop;
name=TextLayers]
3
Schema is not set in this document : http://ns.adobe.com/xap/1.0/
3
no message (NPE)
2
Cannot find a definition for the namespace http://laserfiche.com/xmp/schema/1.0/
2
Cannot find a definition for the namespace
http://ns.adobe.com/AdobeFormsCentralWorkflow/1.0/
2
Cannot find a definition for the namespace
http://ns.adobe.com/camera-raw-settings/1.0/
2
Failed to instanciate property in xapRights:Marked
2
Invalid array type, expecting Alt and found Bag [prefix=dc; name=title]
2
Invalid array type, expecting Alt and found Seq [prefix=dc; name=title]
2
Invalid array type, expecting Seq and found Alt [prefix=dc; name=creator]
2
Cannot find a definition for the namespace
http://ns.cambridgeassociates.com/status/1.0/
1
Cannot find a definition for the namespace
http://ns.computershare.com.au/ccs/1.0/
1
Cannot find a definition for the namespace
http://ns.esko-graphics.com/grinfo/1.0/
1
Cannot find a definition for the namespace
http://ns.tripletriangle.com/ns/tripletri/
1
Cannot find a definition for the namespace
http://prismstandard.org/namespaces/basic/2.1/
1
Cannot find a definition for the namespace http://www.aiim.org/pdfa/ns/id.html
1
Cannot find a definition for the namespace http://www.aiim.org/pdfe/ns/id/
1
Cannot find a definition for the namespace
http://www.enfocus.com/ns/CertifiedPDF/2.0/
1
Cannot find a definition for the namespace
http://www.northplains.com/xmpnps/cov/1.0/
1
Failed to instanciate property in xmpRights:Marked
1
Invalid array type, expecting Seq and found Bag [prefix=dc; name=date]
1
This namespace is not a schema or a structured type :
http://ns.adobe.com/xap/1.0/sType/Job#
1
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]