Re: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

Bob Paulin Wed, 02 Mar 2016 06:00:14 -0800

Also as a follow up... .This means that the JournalParser would havenever worked in tika-bundle since the org.apache.cxf.jaxrs.ext.multipartpackage is required for the GrobidRESTParser to run. Is there a reasonthis was not included? I'm guessing cxf-rt-rs-client dependancy maybecaused problems with other parsers.

Now that the parsers are broken out in to projects in the 2.x branchwe could create bundles for each of them which would allow for theJournalParser to have org.apache.cxf.jaxrs.ext.multipart embeddedwithout impacting the other parsers. I've stubbed out what this mightlook like in the 2.x branch under the tika-parsers-bundle folder. Eachbundle dependencies embedded and inlined (simlair to tika-bundle). I'vealso provided tests to make sure it starts and has a service registeredfor each parser. Thoughts on this approach? Tracking this in:


https://issues.apache.org/jira/browse/TIKA-1860

- Bob

On 3/2/2016 7:46 AM, Bob Paulin wrote:

I saw it on the 2.x branch but now that you mention it's alsohappening in trunk I think I see the issue. The change to thePDFParser includes adding dependencies in the javax.xml.streampackage. The tika-bundle currently has that package marked optional:
javax.xml.stream;version="[1.0,2)";resolution:=optional,
This means that the bundle will start without this class. However nowit's required for the PDFParser to work so my guess is that thePDFParser is not instantiating correctly and it's dropping into theJournalParser which is also coded to handle PDFs. The JournalParsersuffers a similar fate because org.apache.cxf.jaxrs.ext.multipart isoptional on the GrobidRESTParser which gets instantiated in the parsemethod.
So I tried removing :
javax.xml.stream;version="[1.0,2)";resolution:=optional,
javax.xml.stream.events;version="[1.0,2)";resolution:=optional,
javax.xml.stream.util;version="[1.0,2)";resolution:=optional,
From the tika-bundle/pom.xml and it worked! So seeing thatjavax.xml.stream is provided by the JDK I'm a bit curious what thosestatements were doing there to begin with. Anyone know?
- Bob

On 3/2/2016 6:26 AM, Allison, Timothy B. wrote:
Anyone have an idea why trunk is now failing? I couldn't find anychanges between the last successful build and last night's failuresthat would explain this.
Test set: org.apache.tika.bundle.BundleIT
-------------------------------------------------------------------------------Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:21.997 sec <<< FAILURE!testTikaBundle(org.apache.tika.bundle.BundleIT) Time elapsed: 2.374sec <<< ERROR!java.lang.ClassNotFoundException:org.apache.cxf.jaxrs.ext.multipart.ContentDisposition not found byorg.apache.tika.bundle [17]atorg.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1558)atorg.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.java:79)atorg.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1998)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
atorg.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.java:69)atorg.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:60)atorg.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
-----Original Message-----
From: Hudson (JIRA) [mailto:[email protected]]
Sent: Tuesday, March 01, 2016 9:59 PM
To: [email protected]
Subject: [jira] [Commented] (TIKA-1857) Enhance PDFParser to extracttext from XFA forms
[https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174937#comment-15174937]
Hudson commented on TIKA-1857:
------------------------------
UNSTABLE: Integrated in tika-trunk-jdk1.7 #916 (See[https://builds.apache.org/job/tika-trunk-jdk1.7/916/])TIKA-1857: add basic XFA extraction support via Pascal Essiembre.(tallison: rev dbefe9830b26d05f9ce53503565a069bcc63d7c1)*tika-parsers/src/test/resources/test-documents/testPDF_XFA_govdocs1_258578.pdf*tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java*tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java
* tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
* tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java
*tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.properties*tika-parsers/src/main/java/org/apache/tika/parser/pdf/XFAExtractor.javaTIKA-1857: add basic XFA extraction support via Pascal Essiembre.(tallison: rev 7c245fa87507cf0887838001c54c65b79b7e7cbc)
* CHANGES.txt
Enhance PDFParser to extract text from XFA forms
------------------------------------------------

                 Key: TIKA-1857
                 URL: https://issues.apache.org/jira/browse/TIKA-1857
             Project: Tika
          Issue Type: Improvement
          Components: parser
            Reporter: Pascal Essiembre
              Labels: patch
             Fix For: 1.13
Attachments: 041617_filled_out.pdf, govdocs1_xfas.zip,xfa_in_govdocs1.txt
Extract text from PDF Forms (XFA). Information about XFA:https://en.wikipedia.org/wiki/XFA
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

Reply via email to