Hey Chris, This is happening to me with Tesseract enabled but only on my MacBook.
Are you running this on OSX? Been trying to get some time to dig into it as it works perfectly on my Windows and Linux setups. Cheers, Dave On Thu, 24 May 2018, 17:09 Chris Mattmann, <[email protected]> wrote: > Tim, > > > > Are you seeing this? > > > > Results : > > > > Failed tests: > > > PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103 > pdf_haystack not found in: > > <html xmlns="http://www.w3.org/1999/xhtml"> > > <head> > > <meta name="date" content="2013-05-23T18:30:00Z" /> > > <meta name="cp:revision" content="1" /> > > <meta name="extended-properties:AppVersion" content="14.0000" /> > > <meta name="meta:paragraph-count" content="1" /> > > <meta name="meta:word-count" content="16" /> > > <meta name="extended-properties:Company" content="" /> > > <meta name="Word-Count" content="16" /> > > <meta name="dcterms:created" content="2013-05-23T18:30:00Z" /> > > <meta name="meta:line-count" content="1" /> > > <meta name="Last-Modified" content="2013-05-23T18:30:00Z" /> > > <meta name="dcterms:modified" content="2013-05-23T18:30:00Z" /> > > <meta name="Last-Save-Date" content="2013-05-23T18:30:00Z" /> > > <meta name="meta:character-count" content="96" /> > > <meta name="Template" content="Normal.dotm" /> > > <meta name="Line-Count" content="1" /> > > <meta name="Paragraph-Count" content="1" /> > > <meta name="meta:save-date" content="2013-05-23T18:30:00Z" /> > > <meta name="meta:character-count-with-spaces" content="111" /> > > <meta name="Application-Name" content="Microsoft Office Word" /> > > <meta name="modified" content="2013-05-23T18:30:00Z" /> > > <meta name="Content-Type" > content="application/vnd.openxmlformats-officedocument.wordprocessingml.document" > /> > > <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" /> > > <meta name="X-Parsed-By" > content="org.apache.tika.parser.microsoft.ooxml.OOXMLParser" /> > > <meta name="meta:creation-date" content="2013-05-23T18:30:00Z" /> > > <meta name="extended-properties:Application" content="Microsoft Office > Word" /> > > <meta name="Creation-Date" content="2013-05-23T18:30:00Z" /> > > <meta name="xmpTPg:NPages" content="1" /> > > <meta name="Character-Count-With-Spaces" content="111" /> > > <meta name="Character Count" content="96" /> > > <meta name="Page-Count" content="1" /> > > <meta name="Revision-Number" content="1" /> > > <meta name="Application-Version" content="14.0000" /> > > <meta name="extended-properties:Template" content="Normal.dotm" /> > > <meta name="publisher" content="" /> > > <meta name="meta:page-count" content="1" /> > > <meta name="dc:publisher" content="" /> > > <title></title> > > </head> > > <body><p class="header" /> > > <p class="header" /> > > <p class="header" /> > > <p>Outer_haystack</p> > > <p>Outer_haystack</p> > > <p><div class="embedded" id="rId8" /> > > </p> > > <p>Outer_haystack</p> > > <p /> > > <p>Outer_haystack</p> > > <p /> > > <p>Outer_haystack</p> > > <p><a name="_GoBack" /></p> > > <p class="footer" /> > > <p class="footer" /> > > <p class="footer" /> > > <p>attached.pdf</p> > > <div class="page"><div class="ocr">dehayslack dehaystack dehayslack > dehaystack dehaystack dehaystack pd' > > > > </div> > > </div> > > <p class="header" /> > > > > <p class="header" /> > > > > <p class="header" /> > > > > <p>Haystack</p> > > > > <p>Needle</p> > > > > <p>Haystack</p> > > > > <p><a name="_GoBack" /></p> > > > > <p class="footer" /> > > > > <p class="footer" /> > > > > <p class="footer" /> > > > > <div source="attachment" class="embedded" id="Test.docx" /> > > </body></html> > > > > Tests run: 1009, Failures: 1, Errors: 0, Skipped: 30 > > > > [INFO] > ------------------------------------------------------------------------ > > [INFO] Reactor Summary: > > [INFO] > > [INFO] Apache Tika parent ................................. SUCCESS [ > 1.565 s] > > [INFO] Apache Tika core ................................... SUCCESS [ > 32.977 s] > > [INFO] Apache Tika parsers ................................ FAILURE [05:52 > min] > > [INFO] Apache Tika XMP .................................... SKIPPED > > [INFO] Apache Tika serialization .......................... SKIPPED > > [INFO] Apache Tika batch .................................. SKIPPED > > [INFO] Apache Tika language detection ..................... SKIPPED > > [INFO] Apache Tika application ............................ SKIPPED > > [INFO] Apache Tika OSGi bundle ............................ SKIPPED > > [INFO] Apache Tika translate .............................. SKIPPED > > [INFO] Apache Tika server ................................. SKIPPED > > [INFO] Apache Tika examples ............................... SKIPPED > > [INFO] Apache Tika Java-7 Components ...................... SKIPPED > > [INFO] Apache Tika eval ................................... SKIPPED > > [INFO] Apache Tika Deep Learning (powered by DL4J) ........ SKIPPED > > [INFO] Apache Tika Natural Language Processing ............ SKIPPED > > [INFO] Apache Tika ........................................ SKIPPED > > [INFO] > ------------------------------------------------------------------------ > > [INFO] BUILD FAILURE > > [INFO] > ------------------------------------------------------------------------ > > [INFO] Total time: 06:27 min > > [INFO] Finished at: 2018-05-24T09:04:59-07:00 > > [INFO] Final Memory: 72M/1029M > > [INFO] > ------------------------------------------------------------------------ > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) > on project tika-parsers: There are test failures. > > [ERROR] > > [ERROR] Please refer to > /Users/mattmann/tmp/tika2.0.0/tika-parsers/target/surefire-reports for the > individual test results. > > [ERROR] -> [Help 1] > > [ERROR] > > [ERROR] To see the full stack trace of the errors, re-run Maven with the > -e switch. > > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > > [ERROR] > > [ERROR] For more information about the errors and possible solutions, > please read the following articles: > > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > > [ERROR] > > [ERROR] After correcting the problems, you can resume the build with the > command > > [ERROR] mvn <goals> -rf :tika-parsers > > > > Keeps failing for me. > > nonas:tika2.0.0 mattmann$ java -version > > java version "1.8.0_144" > > Java(TM) SE Runtime Environment (build 1.8.0_144-b01) > > Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode) > > nonas:tika2.0.0 mattmann$ > > > > Any ideas? > > > > Cheers, > > Chris > > > >
