[jira] [Comment Edited] (TIKA-3571) Add an interface for rendering engines

Tim Allison (Jira) Tue, 05 Apr 2022 11:56:05 -0700


    [ 
https://issues.apache.org/jira/browse/TIKA-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517647#comment-17517647
 ]


Tim Allison edited comment on TIKA-3571 at 4/5/22 6:55 PM:
-----------------------------------------------------------

-But seriously, this is embarrassing.  Do you have to have 
LibreOffice/OpenOffice installed or are the jars sufficient?-

Requires an installation, but still would be pretty cool...dockerized.

https://github.com/sbraconnier/jodconverter/wiki/Configuration


was (Author: [email protected]):
-But seriously, this is embarrassing.  Do you have to have 
LibreOffice/OpenOffice installed or are the jars sufficient?-

Requires an installation, but still would be pretty cool...dockerized.

> Add an interface for rendering engines
> --------------------------------------
>
>                 Key: TIKA-3571
>                 URL: https://issues.apache.org/jira/browse/TIKA-3571
>             Project: Tika
>          Issue Type: Wish
>            Reporter: Tim Allison
>            Priority: Major
>
> We've now seen a few requests for extracting text _and_ rendering PDFs, and 
> certainly it might be useful to have alternatives for rendering files (e.g. 
> this [Alfresco 
> study|https://hub.alfresco.com/t5/alfresco-content-services-blog/pdf-rendering-engine-performance-and-fidelity-comparison/ba-p/287618]),
>  including MSOffice or at least PPTx...
> And there are cases where users don't want the rendered images, but they do 
> want OCR to be run against the rendered images.
> I doubt I'll have a chance to work on this for a while, but I wanted to open 
> an issue for discussion.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (TIKA-3571) Add an interface for rendering engines

Reply via email to