It wasn't a totally clean pull, but I didn't have anything else there. I had stuff in other branches, but when it failed, I tried it on a clean main. Here's the command line I used: mvn clean install -pl :tika-parsers-classic-package -am
Since my pull request was processed with no problem, clearly, it's not happening on that system. Peter Kronenberg | SENIOR AI ANALYTIC ENGINEER C: 703.887.5623 4303 W. 119th St., Leawood, KS 66209 WWW.TORCH.AI -----Original Message----- From: Tim Allison <[email protected]> Sent: Thursday, April 15, 2021 12:25 PM To: <[email protected]> <[email protected]> Subject: Re: Test failure Hmmmm....I found a couple of other things that I fixed on Windows just now, but I'm not able to replicate it. Are you getting that failure with a clean pull/clone? On Thu, Apr 15, 2021 at 11:48 AM Tim Allison <[email protected]> wrote: > Thank you for sharing! > > Not able to replicate on linux...trying my Windows laptop. > > Unrelated...there's something really broken with the xhtml in that > there are two bodies. I can replicate this on linux. Will open an issue... > > On Thu, Apr 15, 2021 at 10:04 AM Peter Kronenberg < > [email protected]> wrote: > >> We’re getting a test failure. I don’t see any recent check-ins that >> would be causing this, so maybe it’s been there for awhile (I don’t >> always run the tests) >> >> >> >> [INFO] Results: >> >> [INFO] >> >> [ERROR] Failures: >> >> [ERROR] >> TesseractOCRParserTest.testOCROutputsHOCR:105->TikaTest.assertContain >> s:79 <span class="ocrx_word" id="word_1_1" not found in: >> >> <html >> xmlns=https://us-east-2.protection.sophos.com?d=w3.org&u=aHR0cDovL3d3 >> dy53My5vcmcvMTk5OS94aHRtbA==&i=NjAwMDY2MjNjNzQ1NDY0ODkyYTNmNTg3&t=SEh >> WWFZLN1BHMjVlWXplUEZlVFBERFZQUFB0M05pUmlMK2J3cTdQdE1SQT0=&h=a3418ed64 >> 9234751bbaa0a259857d290> >> >> <head> >> >> <meta name="pdf:docinfo:custom:AAPL:Keywords" content="" /> >> >> <meta name="pdf:PDFVersion" content="1.3" /> >> >> <meta name="pdf:docinfo:title" content="Presentation1" /> >> >> <meta name="xmp:CreatorTool" content="PowerPoint" /> >> >> <meta name="pdf:hasXFA" content="false" /> >> >> <meta name="access_permission:modify_annotations" content="true" /> >> >> <meta name="access_permission:can_print_degraded" content="true" /> >> >> <meta name="AAPL:Keywords" content="" /> >> >> <meta name="dc:creator" content="grantingersoll" /> >> >> <meta name="dcterms:created" content="2014-02-08T19:57:12Z" /> >> >> <meta name="dcterms:modified" content="2014-02-08T19:57:12Z" /> >> >> <meta name="dc:format" content="application/pdf; version=1.3" /> >> >> <meta name="pdf:docinfo:creator_tool" content="PowerPoint" /> >> >> <meta name="access_permission:fill_in_form" content="true" /> >> >> <meta name="pdf:docinfo:keywords" content="" /> >> >> <meta name="pdf:docinfo:modified" content="2014-02-08T19:57:12Z" /> >> >> <meta name="pdf:encrypted" content="false" /> >> >> <meta name="dc:title" content="Presentation1" /> >> >> <meta name="cp:subject" content="" /> >> >> <meta name="pdf:docinfo:subject" content="" /> >> >> <meta name="pdf:hasMarkedContent" content="false" /> >> >> <meta name="Content-Type" content="application/pdf" /> >> >> <meta name="pdf:docinfo:creator" content="grantingersoll" /> >> >> <meta name="dc:subject" content="" /> >> >> <meta name="dc:subject" content="" /> >> >> <meta name="dc:subject" content="" /> >> >> <meta name="dc:subject" content="" /> >> >> <meta name="pdf:producer" content="Mac OS X 10.9.1 Quartz PDFContext" >> /> >> >> <meta name="access_permission:extract_for_accessibility" >> content="true" /> >> >> <meta name="access_permission:assemble_document" content="true" /> >> >> <meta name="xmpTPg:NPages" content="1" /> >> >> <meta name="pdf:hasXMP" content="false" /> >> >> <meta name="access_permission:extract_content" content="true" /> >> >> <meta name="access_permission:can_print" content="true" /> >> >> <meta name="X-TIKA:Parsed-By" >> content="org.apache.tika.parser.DefaultParser" /> >> >> <meta name="X-TIKA:Parsed-By" >> content="org.apache.tika.parser.pdf.PDFParser" /> >> >> <meta name="meta:keyword" content="" /> >> >> <meta name="access_permission:can_modify" content="true" /> >> >> <meta name="pdf:docinfo:producer" content="Mac OS X 10.9.1 Quartz >> PDFContext" /> >> >> <meta name="pdf:docinfo:created" content="2014-02-08T19:57:12Z" /> >> >> <title>Presentation1</title> >> >> </head> >> >> <body><div class="page"><p /> >> >> <img src="embedded:image0.png" alt="image0.png" /></div> >> >> </body></html><html >> xmlns=https://us-east-2.protection.sophos.com?d=w3.org&u=aHR0cDovL3d3 >> dy53My5vcmcvMTk5OS94aHRtbA==&i=NjAwMDY2MjNjNzQ1NDY0ODkyYTNmNTg3&t=SEh >> WWFZLN1BHMjVlWXplUEZlVFBERFZQUFB0M05pUmlMK2J3cTdQdE1SQT0=&h=a3418ed64 >> 9234751bbaa0a259857d290> >> >> <head> >> >> <meta name="Transparency Alpha" content="none" /> >> >> <meta name="tiff:ImageLength" content="261" /> >> >> <meta name="Compression CompressionTypeName" content="deflate" /> >> >> <meta name="Data BitsPerSample" content="8 8 8" /> >> >> <meta name="Data PlanarConfiguration" content="PixelInterleaved" /> >> >> <meta name="Dimension VerticalPixelSize" content="0.35273367" /> >> >> <meta name="IHDR" content="width=934, height=261, bitDepth=8, >> colorType=RGB, compressionMethod=deflate, filterMethod=adaptive, >> interlaceMethod=none" /> >> >> <meta name="embeddedResourceType" content="INLINE" /> >> >> <meta name="Chroma ColorSpaceType" content="RGB" /> >> >> <meta name="tiff:BitsPerSample" content="8 8 8" /> >> >> <meta name="Content-Type" content="image/png" /> >> >> <meta name="height" content="261" /> >> >> <meta name="pHYs" content="pixelsPerUnitXAxis=2835, >> pixelsPerUnitYAxis=2835, unitSpecifier=meter" /> >> >> <meta name="Dimension PixelAspectRatio" content="1.0" /> >> >> <meta name="resourceName" content="image0.png" /> >> >> <meta name="pdf:hasXMP" content="false" /> >> >> <meta name="Compression NumProgressiveScans" content="1" /> >> >> <meta name="Content-Type-Parser-Override" content="image/ocr-png" /> >> >> <meta name="X-TIKA:Parsed-By" >> content="org.apache.tika.parser.DefaultParser" /> >> >> <meta name="X-TIKA:Parsed-By" >> content="org.apache.tika.parser.image.ImageParser" /> >> >> <meta name="X-TIKA:Parsed-By" >> content="org.apache.tika.parser.ocr.TesseractOCRParser" /> >> >> <meta name="Dimension HorizontalPixelSize" content="0.35273367" /> >> >> <meta name="Chroma BlackIsZero" content="true" /> >> >> <meta name="Compression Lossless" content="true" /> >> >> <meta name="X-TIKA:embedded_depth" content="1" /> >> >> <meta name="width" content="934" /> >> >> <meta name="Dimension ImageOrientation" content="Normal" /> >> >> <meta name="X-TIKA:embedded_resource_path" content="/image0.png" /> >> >> <meta name="tiff:ImageWidth" content="934" /> >> >> <meta name="Chroma NumChannels" content="3" /> >> >> <meta name="Data SampleFormat" content="UnsignedIntegral" /> >> >> <title></title> >> >> </head> >> >> <body /></html> >> >> [INFO] >> >> [ERROR] Tests run: 305, Failures: 1, Errors: 0, Skipped: 10 >> >> [INFO] >> >> [INFO] >> --------------------------------------------------------------------- >> --- >> >> [INFO] Reactor Summary for Apache Tika parent 2.0.0-SNAPSHOT: >> >> [INFO] >> >> [INFO] Apache Tika parent ................................. SUCCESS [ >> 2.952 s] >> >> [INFO] Apache Tika core ................................... SUCCESS [ >> 37.037 s] >> >> [INFO] tika-parsers ....................................... SUCCESS [ >> 0.225 s] >> >> [INFO] Apache Tika classic parser modules and package ..... SUCCESS [ >> 0.500 s] >> >> [INFO] Apache Tika classic parser modules ................. SUCCESS [ >> 0.261 s] >> >> [INFO] tika-parser-html-commons ........................... SUCCESS [ >> 1.773 s] >> >> [INFO] tika-parser-digest-commons ......................... SUCCESS [ >> 0.998 s] >> >> [INFO] tika-parser-mail-commons ........................... SUCCESS [ >> 1.627 s] >> >> [INFO] tika-parser-xmp-commons ............................ SUCCESS [ >> 2.008 s] >> >> [INFO] tika-parser-zip-commons ............................ SUCCESS [ >> 2.405 s] >> >> [INFO] tika-parser-image-module ........................... SUCCESS [ >> 4.140 s] >> >> [INFO] tika-parser-ocr-module ............................. SUCCESS [ >> 16.227 s] >> >> [INFO] tika-parser-audiovideo-module ...................... SUCCESS [ >> 2.998 s] >> >> [INFO] tika-parser-text-module ............................ SUCCESS [ >> 3.578 s] >> >> [INFO] tika-parser-code-module ............................ SUCCESS [ >> 3.739 s] >> >> [INFO] tika-parser-html-module ............................ SUCCESS [ >> 3.842 s] >> >> [INFO] tika-parser-font-module ............................ SUCCESS [ >> 2.291 s] >> >> [INFO] tika-parser-xml-module ............................. SUCCESS [ >> 2.637 s] >> >> [INFO] tika-parser-microsoft-module ....................... SUCCESS [ >> 46.829 s] >> >> [INFO] tika-parser-pkg-module ............................. SUCCESS [ >> 3.862 s] >> >> [INFO] tika-parser-pdf-module ............................. SUCCESS [ >> 15.538 s] >> >> [INFO] tika-parser-apple-module ........................... SUCCESS [ >> 3.497 s] >> >> [INFO] tika-parser-cad-module ............................. SUCCESS [ >> 2.195 s] >> >> [INFO] tika-parser-mail-module ............................ SUCCESS [ >> 9.893 s] >> >> [INFO] tika-parser-miscoffice-module ...................... SUCCESS [ >> 8.474 s] >> >> [INFO] tika-parser-news-module ............................ SUCCESS [ >> 1.982 s] >> >> [INFO] tika-parser-crypto-module .......................... SUCCESS [ >> 2.624 s] >> >> [INFO] Apache Tika classic parser package ................. FAILURE >> [02:15 min] >> >> [INFO] >> --------------------------------------------------------------------- >> --- >> >> [INFO] BUILD FAILURE >> >> [INFO] >> --------------------------------------------------------------------- >> --- >> >> [INFO] Total time: 05:21 min >> >> [INFO] Finished at: 2021-04-15T10:00:49-04:00 >> >> [INFO] >> --------------------------------------------------------------------- >> --- >> >> [ERROR] Failed to execute goal >> org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M4:test >> (default-test) on project tika-parsers-classic-package: There are test >> failures. >> >> [ERROR] >> >> [ERROR] Please refer to >> C:\tika\tika-parsers\tika-parsers-classic\tika-parsers-classic-packag >> e\target\surefire-reports >> for the individual test results. >> >> [ERROR] Please refer to dump files (if any exist) [date].dump, >> [date]-jvmRun[N].dump and [date].dumpstream. >> >> [ERROR] -> [Help 1] >> >> [ERROR] >> >> [ERROR] To see the full stack trace of the errors, re-run Maven with >> the -e switch. >> >> [ERROR] Re-run Maven using the -X switch to enable full debug logging. >> >> [ERROR] >> >> [ERROR] For more information about the errors and possible solutions, >> please read the following articles: >> >> [ERROR] [Help 1] >> https://us-east-2.protection.sophos.com?d=apache.org&u=aHR0cDovL2N3aW >> tpLmFwYWNoZS5vcmcvY29uZmx1ZW5jZS9kaXNwbGF5L01BVkVOL01vam9GYWlsdXJlRXh >> jZXB0aW9u&i=NjAwMDY2MjNjNzQ1NDY0ODkyYTNmNTg3&t=T3hmVU5xL1pCNjZZQVplWW >> tiWTA4TkVyRzJhZXNnLysvVEVCa3lCc05YZz0=&h=a3418ed649234751bbaa0a259857 >> d290 >> >> [ERROR] >> >> [ERROR] After correcting the problems, you can resume the build with >> the command >> >> [ERROR] mvn <args> -rf :tika-parsers-classic-package >> >> >> >> c:\tika> >> >> >> >> >> >> >> >> >> >> >> >> *Peter Kronenberg* *| * *Senior AI Analytic ENGINEER * >> >> *C: 703.887.5623* >> >> [image: Torch AI] >> <https://us-east-2.protection.sophos.com?d=torch.ai&u=aHR0cDovL3d3dy5 >> 0b3JjaC5haS8=&i=NjAwMDY2MjNjNzQ1NDY0ODkyYTNmNTg3&t=dHRDUUJralFuRnRCU2 >> tvcmRLNUUycFdBV2RmazdTZU0zZUZVM21GSXhobz0=&h=a3418ed649234751bbaa0a25 >> 9857d290> >> >> 4303 W. 119th St., Leawood, KS 66209 >> https://us-east-2.protection.sophos.com?d=torch.ai&u=d3d3LlRPUkNILkFJ >> &i=NjAwMDY2MjNjNzQ1NDY0ODkyYTNmNTg3&t=Nm1Pd1NUek94MUNheHppZ0RpaUZ4RVl >> YemhyTlhSa1M3Ly9FUFhXeDc5dz0=&h=a3418ed649234751bbaa0a259857d290 >> <https://us-east-2.protection.sophos.com?d=torch.ai&u=aHR0cDovL3d3dy5 >> 0b3JjaC5haS8=&i=NjAwMDY2MjNjNzQ1NDY0ODkyYTNmNTg3&t=dHRDUUJralFuRnRCU2 >> tvcmRLNUUycFdBV2RmazdTZU0zZUZVM21GSXhobz0=&h=a3418ed649234751bbaa0a25 >> 9857d290> >> >> >> >> >> >
