[jira] [Comment Edited] (TIKA-2290) PDFParser 'ocr' properties cannot be set via headers when using Tika JAXRS

Tim Allison (JIRA) Mon, 05 Jun 2017 11:39:47 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037356#comment-16037356
 ]


Tim Allison edited comment on TIKA-2290 at 6/5/17 6:38 PM:
-----------------------------------------------------------

Sorry for my delay. 

This works for me:
{noformat}
curl -T testOCR.pdf http://localhost:9998/tika --header "Accept: text/plain" 
--header "X-Tika-PDFOcrStrategy: ocr_only"
{noformat}


was (Author: [email protected]):
Sorry for my delay.  Is it a capitalization issue on the "O" in "OcrStrategy"?

This works for me:
{noformat}
curl -T testOCR.pdf http://localhost:9998/tika --header "Accept: text/plain" 
--header "X-Tika-PDFOcrStrategy: ocr_only"
{noformat}

> PDFParser 'ocr' properties cannot be set via headers when using Tika JAXRS
> --------------------------------------------------------------------------
>
>                 Key: TIKA-2290
>                 URL: https://issues.apache.org/jira/browse/TIKA-2290
>             Project: Tika
>          Issue Type: Bug
>          Components: ocr, parser
>    Affects Versions: 1.13, 1.14
>            Reporter: Kevin Oberlag
>            Assignee: Tim Allison
>             Fix For: 2.0, 1.15
>
>
> I have created a stackoverflow question on this topic [here | 
> http://stackoverflow.com/questions/42602834/x-tika-pdfocrstrategy-is-an-invalid-x-tika-ocr-header-error],
>  but I'll reiterate the main issue. 
> I am trying to use TikaJAXRS and add headers for setting PDFParser 
> properties. Specifically the ocrStrategy property. However, when I add the 
> header using X-Tika-PDFocrStrategy, I get an error stating that it is an 
> invalid X-Tika-OCR header.
> After looking into the source code, I believe the issue might be with the 
> 'fillParseContext' method in the TikaResource.java file.
> The if statement first looks for a key that starts with the OCR header 
> prefix, and since the PDFParser's property name contains 'ocr', it is trying 
> to find a property named 'ocrStrategy' in the OCRParser class, which doesn't 
> exist.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (TIKA-2290) PDFParser 'ocr' properties cannot be set via headers when using Tika JAXRS

Reply via email to