[ 
https://issues.apache.org/jira/browse/TIKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332209#comment-15332209
 ] 

Tim Allison edited comment on TIKA-1986 at 6/15/16 6:57 PM:
------------------------------------------------------------

>From Nick
bq. > I think that's exactly what ParseContext should be for..it should be a 
vehicle for Param passing. We can delineate by property name (FQ) and/or by 
class.
bq. I view ParseContext as somewhere you configure things on a per-document 
basis, not a per-parser basis.
bq. So, need to set where Tesseract lives on your system? Applies to 
everything, so on the parser. Need to tell Tesseract to use a German not an 
English dictionary on this particular jpeg? Applies to just this one document 
being parserd, so on the ParseContext

[link|https://issues.apache.org/jira/browse/TIKA-1508?focusedCommentId=15187205&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15187205]

My current proposal is to add [[email protected]]'s fantastic beans for the 
initialization step, but to go back to what we have for runtime/per-file 
(setting parameters programmatically).

If we allow users to use the Param stuff programmatically, they'll have some 
nasty java like so:

{noformat}
        Param<Boolean> paramVal = new Param<>("sortByPosition", new 
Boolean(true));
        context.setParam(PDFParser.class, paramVal);
{noformat}

And there are no compile time guarantees that "sortByPosition" exists for 
PDFParser...


was (Author: [email protected]):
>From Nick
bq. > I think that's exactly what ParseContext should be for..it should be a 
vehicle for Param passing. We can delineate by property name (FQ) and/or by 
class.
I view ParseContext as somewhere you configure things on a per-document basis, 
not a per-parser basis.
So, need to set where Tesseract lives on your system? Applies to everything, so 
on the parser. Need to tell Tesseract to use a German not an English dictionary 
on this particular jpeg? Applies to just this one document being parserd, so on 
the ParseContext

[link|https://issues.apache.org/jira/browse/TIKA-1508?focusedCommentId=15187205&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15187205]

My current proposal is to add [[email protected]]'s fantastic beans for the 
initialization step, but to go back to what we have for runtime/per-file 
(setting parameters programmatically).

If we allow users to use the Param stuff programmatically, they'll have some 
nasty java like so:

{noformat}
        Param<Boolean> paramVal = new Param<>("sortByPosition", new 
Boolean(true));
        context.setParam(PDFParser.class, paramVal);
{noformat}

And there are no compile time guarantees that "sortByPosition" exists for 
PDFParser...

> support parser parameters with type (int, double, etc) in configuration XML 
> file
> --------------------------------------------------------------------------------
>
>                 Key: TIKA-1986
>                 URL: https://issues.apache.org/jira/browse/TIKA-1986
>             Project: Tika
>          Issue Type: Sub-task
>          Components: config
>            Reporter: Thamme Gowda
>             Fix For: 1.14
>
>
> Tika Configuration should be enhanced to support for basic types like int, 
> double, boolean, url, file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to