[
https://issues.apache.org/jira/browse/TIKA-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086568#comment-18086568
]
Adrian Bird commented on TIKA-4750:
-----------------------------------
> my guess is that the tess4j jar (tika-parser-tess4j-module) wasn't on your
> classpath. We've improved the warning message when this happens.
That is definitely part of the problem, but also the fact I didn't read the
documentation closely enough. Having read the documentation more closely I
realized there were a lot of things I hadn't done (think of me as a Windows
user who doesn't do development and 'jni' and 'maven' are a mystery).
Apart from tess4j.jar I also needed to download lept4j.jar and jna. I added
those to my classpath.
I also needed tika-parser-tess4j-module which I did manage to find, but wasn't
sure what to do with it, but I did add it to the classpath as well.
None of this made any difference.
Given the complexity from my point of view, I think getting this to work would
require me to do some additional learning about things I don't know about, so
I'm going to give up on tess4j.
The documentation is probably ok for users who understand 'jni' and 'maven', or
who are willing to learn about them, so I'm not suggesting changes to the
documentation just to cater for people like me.
> tika-4.0.0-alpha1 - tess4j-parser not available
> -----------------------------------------------
>
> Key: TIKA-4750
> URL: https://issues.apache.org/jira/browse/TIKA-4750
> Project: Tika
> Issue Type: Bug
> Affects Versions: 4.0.0
> Reporter: Adrian Bird
> Priority: Major
>
> I've tried to use the 'tess4j-parser' but get the following error:
>
> {noformat}
> DEBUG [main] 09:09:06,858
> org.apache.tika.config.loader.TikaObjectMapperFactory Loaded component
> registry: parse-context
> Exception in thread "main" org.apache.tika.exception.TikaConfigException:
> Unknown component type: 'tess4j-parser'
> at
> org.apache.tika.config.loader.ComponentInstantiator.instantiate(ComponentInstantiator.java:179)
> at
> org.apache.tika.config.loader.LoaderContext.instantiate(LoaderContext.java:110)
> at
> org.apache.tika.config.loader.ParserLoader.loadComponent(ParserLoader.java:61)
> at
> org.apache.tika.config.loader.ParserLoader.loadComponent(ParserLoader.java:46)
> at
> org.apache.tika.config.loader.AbstractSpiComponentLoader.load(AbstractSpiComponentLoader.java:107)
> at
> org.apache.tika.config.loader.TikaLoader.loadComponent(TikaLoader.java:683)
> at org.apache.tika.config.loader.TikaLoader.get(TikaLoader.java:647)
> at
> org.apache.tika.config.loader.TikaLoader.loadParsers(TikaLoader.java:247)
> at
> org.apache.tika.config.loader.TikaLoader.loadAutoDetectParser(TikaLoader.java:379)
> at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:901)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:532)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:267)
> Caused by: java.lang.ClassNotFoundException: Component 'tess4j-parser' is not
> registered. Components must be registered via @TikaComponent annotation or
> .idx file. Arbitrary class names are not allowed for security reasons.
> at
> org.apache.tika.serialization.ComponentNameResolver.resolveClass(ComponentNameResolver.java:116)
> at
> org.apache.tika.config.loader.ComponentInstantiator.instantiate(ComponentInstantiator.java:176)
> ... 11 more
> {noformat}
> FYI I've probably done all the testing I'm going to with this version.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)