[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832670#comment-16832670 ]
Hudson commented on TIKA-2749: ------------------------------ SUCCESS: Integrated in Jenkins build Tika-trunk #1654 (See [https://builds.apache.org/job/Tika-trunk/1654/]) TIKA-2749 -- add initial, optional "AUTO" mode for OCR'ing of PDF pages (tallison: [https://github.com/apache/tika/commit/f72841353c30ba0ece3bdd40570ccdb03c3f8994]) * (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java * (edit) CHANGES.txt * (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java > OCR on PDFs should "just work" out of the box > --------------------------------------------- > > Key: TIKA-2749 > URL: https://issues.apache.org/jira/browse/TIKA-2749 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > Attachments: 06.Qui peut reĢduire vos amendes TVA bis (1).pdf > > > There are now two different ways (with various parameters) to trigger OCR on > inline images within PDFs. The user has to 1) understand that these are > available and then 2) elect to turn one of those on. > I think we should make OCR'ing on PDFs "just work" perhaps with a hybrid > strategy between the 2 options. Users should still be allowed to configure > as they wish, of course. -- This message was sent by Atlassian JIRA (v7.6.3#76005)