[
https://issues.apache.org/jira/browse/TIKA-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346899#comment-17346899
]
Luís Filipe Nassif commented on TIKA-3384:
------------------------------------------
{quote}Would users have to configure the SupplementingParser?
{quote}
I think it is possible to provide some preconfigured supplementing parsers for
users in tika config.
{quote}Can we use the SupplementingParser for per page processing as in PDFs?
{quote}
That's a good point, I think it is not possible currently.
> Convert new transcribe package to a Parser
> ------------------------------------------
>
> Key: TIKA-3384
> URL: https://issues.apache.org/jira/browse/TIKA-3384
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> This is a proposal to convert [~lewismc] et al's awesome new transcribe code
> into a parser along the lines of Tesseract.
> In 2.x, I inverted the call order from 1.x. The image parsers now look to
> see if there's a parser that supports a pseudo mime, like {{image/ocr-jpeg}},
> if there is, then they apply that parser to the stream. We could do the same
> thing with media files that the new transcription package supports.
> For those who want only ocr/transcription, they can turn off the image
> parsers and then decorate the OCR parser, for example, with {{supports
> "image/jpeg"}} and that parser will be called directly.
> What do you think?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)