Thank you for sharing!

Not able to replicate on linux...trying my Windows laptop.

Unrelated...there's something really broken with the xhtml in that there
are two bodies.  I can replicate this on linux.  Will open an issue...

On Thu, Apr 15, 2021 at 10:04 AM Peter Kronenberg <[email protected]>
wrote:

> We’re getting a test failure.  I don’t see any recent check-ins that would
> be causing this, so maybe it’s been there for awhile (I don’t always run
> the tests)
>
>
>
> [INFO] Results:
>
> [INFO]
>
> [ERROR] Failures:
>
> [ERROR]
> TesseractOCRParserTest.testOCROutputsHOCR:105->TikaTest.assertContains:79
> <span class="ocrx_word" id="word_1_1" not found in:
>
> <html xmlns=http://www.w3.org/1999/xhtml>
>
> <head>
>
> <meta name="pdf:docinfo:custom:AAPL:Keywords" content="" />
>
> <meta name="pdf:PDFVersion" content="1.3" />
>
> <meta name="pdf:docinfo:title" content="Presentation1" />
>
> <meta name="xmp:CreatorTool" content="PowerPoint" />
>
> <meta name="pdf:hasXFA" content="false" />
>
> <meta name="access_permission:modify_annotations" content="true" />
>
> <meta name="access_permission:can_print_degraded" content="true" />
>
> <meta name="AAPL:Keywords" content="" />
>
> <meta name="dc:creator" content="grantingersoll" />
>
> <meta name="dcterms:created" content="2014-02-08T19:57:12Z" />
>
> <meta name="dcterms:modified" content="2014-02-08T19:57:12Z" />
>
> <meta name="dc:format" content="application/pdf; version=1.3" />
>
> <meta name="pdf:docinfo:creator_tool" content="PowerPoint" />
>
> <meta name="access_permission:fill_in_form" content="true" />
>
> <meta name="pdf:docinfo:keywords" content="" />
>
> <meta name="pdf:docinfo:modified" content="2014-02-08T19:57:12Z" />
>
> <meta name="pdf:encrypted" content="false" />
>
> <meta name="dc:title" content="Presentation1" />
>
> <meta name="cp:subject" content="" />
>
> <meta name="pdf:docinfo:subject" content="" />
>
> <meta name="pdf:hasMarkedContent" content="false" />
>
> <meta name="Content-Type" content="application/pdf" />
>
> <meta name="pdf:docinfo:creator" content="grantingersoll" />
>
> <meta name="dc:subject" content="" />
>
> <meta name="dc:subject" content="" />
>
> <meta name="dc:subject" content="" />
>
> <meta name="dc:subject" content="" />
>
> <meta name="pdf:producer" content="Mac OS X 10.9.1 Quartz PDFContext" />
>
> <meta name="access_permission:extract_for_accessibility" content="true" />
>
> <meta name="access_permission:assemble_document" content="true" />
>
> <meta name="xmpTPg:NPages" content="1" />
>
> <meta name="pdf:hasXMP" content="false" />
>
> <meta name="access_permission:extract_content" content="true" />
>
> <meta name="access_permission:can_print" content="true" />
>
> <meta name="X-TIKA:Parsed-By"
> content="org.apache.tika.parser.DefaultParser" />
>
> <meta name="X-TIKA:Parsed-By"
> content="org.apache.tika.parser.pdf.PDFParser" />
>
> <meta name="meta:keyword" content="" />
>
> <meta name="access_permission:can_modify" content="true" />
>
> <meta name="pdf:docinfo:producer" content="Mac OS X 10.9.1 Quartz
> PDFContext" />
>
> <meta name="pdf:docinfo:created" content="2014-02-08T19:57:12Z" />
>
> <title>Presentation1</title>
>
> </head>
>
> <body><div class="page"><p />
>
> <img src="embedded:image0.png" alt="image0.png" /></div>
>
> </body></html><html xmlns=http://www.w3.org/1999/xhtml>
>
> <head>
>
> <meta name="Transparency Alpha" content="none" />
>
> <meta name="tiff:ImageLength" content="261" />
>
> <meta name="Compression CompressionTypeName" content="deflate" />
>
> <meta name="Data BitsPerSample" content="8 8 8" />
>
> <meta name="Data PlanarConfiguration" content="PixelInterleaved" />
>
> <meta name="Dimension VerticalPixelSize" content="0.35273367" />
>
> <meta name="IHDR" content="width=934, height=261, bitDepth=8,
> colorType=RGB, compressionMethod=deflate, filterMethod=adaptive,
> interlaceMethod=none" />
>
> <meta name="embeddedResourceType" content="INLINE" />
>
> <meta name="Chroma ColorSpaceType" content="RGB" />
>
> <meta name="tiff:BitsPerSample" content="8 8 8" />
>
> <meta name="Content-Type" content="image/png" />
>
> <meta name="height" content="261" />
>
> <meta name="pHYs" content="pixelsPerUnitXAxis=2835,
> pixelsPerUnitYAxis=2835, unitSpecifier=meter" />
>
> <meta name="Dimension PixelAspectRatio" content="1.0" />
>
> <meta name="resourceName" content="image0.png" />
>
> <meta name="pdf:hasXMP" content="false" />
>
> <meta name="Compression NumProgressiveScans" content="1" />
>
> <meta name="Content-Type-Parser-Override" content="image/ocr-png" />
>
> <meta name="X-TIKA:Parsed-By"
> content="org.apache.tika.parser.DefaultParser" />
>
> <meta name="X-TIKA:Parsed-By"
> content="org.apache.tika.parser.image.ImageParser" />
>
> <meta name="X-TIKA:Parsed-By"
> content="org.apache.tika.parser.ocr.TesseractOCRParser" />
>
> <meta name="Dimension HorizontalPixelSize" content="0.35273367" />
>
> <meta name="Chroma BlackIsZero" content="true" />
>
> <meta name="Compression Lossless" content="true" />
>
> <meta name="X-TIKA:embedded_depth" content="1" />
>
> <meta name="width" content="934" />
>
> <meta name="Dimension ImageOrientation" content="Normal" />
>
> <meta name="X-TIKA:embedded_resource_path" content="/image0.png" />
>
> <meta name="tiff:ImageWidth" content="934" />
>
> <meta name="Chroma NumChannels" content="3" />
>
> <meta name="Data SampleFormat" content="UnsignedIntegral" />
>
> <title></title>
>
> </head>
>
> <body /></html>
>
> [INFO]
>
> [ERROR] Tests run: 305, Failures: 1, Errors: 0, Skipped: 10
>
> [INFO]
>
> [INFO]
> ------------------------------------------------------------------------
>
> [INFO] Reactor Summary for Apache Tika parent 2.0.0-SNAPSHOT:
>
> [INFO]
>
> [INFO] Apache Tika parent ................................. SUCCESS [
> 2.952 s]
>
> [INFO] Apache Tika core ................................... SUCCESS [
> 37.037 s]
>
> [INFO] tika-parsers ....................................... SUCCESS [
> 0.225 s]
>
> [INFO] Apache Tika classic parser modules and package ..... SUCCESS [
> 0.500 s]
>
> [INFO] Apache Tika classic parser modules ................. SUCCESS [
> 0.261 s]
>
> [INFO] tika-parser-html-commons ........................... SUCCESS [
> 1.773 s]
>
> [INFO] tika-parser-digest-commons ......................... SUCCESS [
> 0.998 s]
>
> [INFO] tika-parser-mail-commons ........................... SUCCESS [
> 1.627 s]
>
> [INFO] tika-parser-xmp-commons ............................ SUCCESS [
> 2.008 s]
>
> [INFO] tika-parser-zip-commons ............................ SUCCESS [
> 2.405 s]
>
> [INFO] tika-parser-image-module ........................... SUCCESS [
> 4.140 s]
>
> [INFO] tika-parser-ocr-module ............................. SUCCESS [
> 16.227 s]
>
> [INFO] tika-parser-audiovideo-module ...................... SUCCESS [
> 2.998 s]
>
> [INFO] tika-parser-text-module ............................ SUCCESS [
> 3.578 s]
>
> [INFO] tika-parser-code-module ............................ SUCCESS [
> 3.739 s]
>
> [INFO] tika-parser-html-module ............................ SUCCESS [
> 3.842 s]
>
> [INFO] tika-parser-font-module ............................ SUCCESS [
> 2.291 s]
>
> [INFO] tika-parser-xml-module ............................. SUCCESS [
> 2.637 s]
>
> [INFO] tika-parser-microsoft-module ....................... SUCCESS [
> 46.829 s]
>
> [INFO] tika-parser-pkg-module ............................. SUCCESS [
> 3.862 s]
>
> [INFO] tika-parser-pdf-module ............................. SUCCESS [
> 15.538 s]
>
> [INFO] tika-parser-apple-module ........................... SUCCESS [
> 3.497 s]
>
> [INFO] tika-parser-cad-module ............................. SUCCESS [
> 2.195 s]
>
> [INFO] tika-parser-mail-module ............................ SUCCESS [
> 9.893 s]
>
> [INFO] tika-parser-miscoffice-module ...................... SUCCESS [
> 8.474 s]
>
> [INFO] tika-parser-news-module ............................ SUCCESS [
> 1.982 s]
>
> [INFO] tika-parser-crypto-module .......................... SUCCESS [
> 2.624 s]
>
> [INFO] Apache Tika classic parser package ................. FAILURE [02:15
> min]
>
> [INFO]
> ------------------------------------------------------------------------
>
> [INFO] BUILD FAILURE
>
> [INFO]
> ------------------------------------------------------------------------
>
> [INFO] Total time:  05:21 min
>
> [INFO] Finished at: 2021-04-15T10:00:49-04:00
>
> [INFO]
> ------------------------------------------------------------------------
>
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M4:test (default-test)
> on project tika-parsers-classic-package: There are test failures.
>
> [ERROR]
>
> [ERROR] Please refer to
> C:\tika\tika-parsers\tika-parsers-classic\tika-parsers-classic-package\target\surefire-reports
> for the individual test results.
>
> [ERROR] Please refer to dump files (if any exist) [date].dump,
> [date]-jvmRun[N].dump and [date].dumpstream.
>
> [ERROR] -> [Help 1]
>
> [ERROR]
>
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e switch.
>
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>
> [ERROR]
>
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
>
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
>
> [ERROR]
>
> [ERROR] After correcting the problems, you can resume the build with the
> command
>
> [ERROR]   mvn <args> -rf :tika-parsers-classic-package
>
>
>
> c:\tika>
>
>
>
>
>
>
>
>
>
>
>
> *Peter Kronenberg*  *| * *Senior AI Analytic ENGINEER *
>
> *C: 703.887.5623*
>
> [image: Torch AI] <http://www.torch.ai/>
>
> 4303 W. 119th St., Leawood, KS 66209
> WWW.TORCH.AI <http://www.torch.ai/>
>
>
>
>
>

Reply via email to