Hi Sergey, That the thumbnail file name showed up in the stream is a bug I introduced in 2.3.x. I missed it in the fix in 2.4.0 (TIKA-3711), but I just fixed it now (TIKA-3745). Are you not seeing "Hello Quarkus" at all, or is it just not the only text -- contains vs equals? I am seeing "Hello Quarkus" in at least the 2.4.0-rc1.
On Fri, Apr 29, 2022 at 10:54 AM Sergey Beryozkin <[email protected]> wrote: > > Hi Tim, All > > I have a simple test reading a string content from an ODT doc failing, PDF, > Excel are good, but something is going on with the ODT parsing. > > quarkus.odt in > https://github.com/quarkiverse/quarkus-tika/blob/main/integration-tests/src/main/resources/ > is expected to return a "Hello Quarkus" string > > but now the test fails with > > Expected: is "Hello Quarkus" > Actual: Thumbnails/thumbnail.png. > > AutoDetectParser is used to parse, using a standard sequence > > https://github.com/quarkiverse/quarkus-tika/blob/main/runtime/src/main/java/io/quarkus/tika/TikaParser.java#L85 > > May be it is an auto-detection issue, the media type which is used is here: > > https://github.com/quarkiverse/quarkus-tika/blob/main/integration-tests/src/test/java/io/quarkus/it/tika/TikaParserTest.java#L25 > > Any hints will be appreciated > > Thanks, Sergey
