[
https://issues.apache.org/jira/browse/TIKA-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263295#comment-16263295
]
ASF GitHub Bot commented on TIKA-1988:
--------------------------------------
chrismattmann commented on issue #186: fix for TIKA-1988 contributed by
[email protected]
URL: https://github.com/apache/tika/pull/186#issuecomment-346464668
hi @r00t1ng got it. So if you are doing text extraction from image based
PDFs and using the python wrapper, it should be working. You can control what
parsers are getting called by providing a custom tika-config.xml file.
Depending on what type of PDF it is, you should check:
1. Does Tesseract (outside of Tika) extract text from the PDF? If so
what are the settings used from the command line?
2. If Tesseract doesn't extract text outside of Tika then Tika won't b/c
it's just a pass through to Tesseract on that part.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Age Detection Tika Recogniser
> -----------------------------
>
> Key: TIKA-1988
> URL: https://issues.apache.org/jira/browse/TIKA-1988
> Project: Tika
> Issue Type: New Feature
> Reporter: Madhav Sharan
> Assignee: Chris A. Mattmann
> Labels: age, machine_learning, memex, nlp, opennlp
> Fix For: 1.17
>
>
> Author age can be firs feature and more can be added later
> --
> Integrating work done on age classification. More details about classifier in
> below repo -
> https://github.com/USCDataScience/Age-Predictor
> Git repo have a java client which can be integrated in Tika
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)