[ 
https://issues.apache.org/jira/browse/TIKA-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187784#comment-15187784
 ] 

Tim Allison commented on TIKA-1508:
-----------------------------------

bq. Maybe not too complex, but not as a start  Just my 2c.

The reason I propose this to start is so that we don't have to worry about 
changing our config and backward compatibility ... :)


bq. I think solr way is complex to implement considering that we dont gain much 
after the effort (As of now we can just do Integer.parse() or similar ). Plus 
it introduces ambiguities with the type expected by parsers and the values 
supplied from configuration.

I think we gain quite a bit.  The reason I suggested it is tied to 3)...What we 
would gain is automatic type checking/verification on loading from the config 
file.

If the configurator were something like this:

{code}
    public static void configure(Configurable configurable, Map<String, 
ParamValue> params)
            throws TikaConfigParameterException {
        for (String k : params.keySet()) {
            //camel case the first character
            String setterName = 
"set"+k.substring(0,1).toUpperCase(Locale.ENGLISH)+k.substring(1);
            try {
                Method method = 
configurable.getClass().getDeclaredMethod(setterName, Boolean.class);
                ParamValue v = params.get(k);
                switch (v.getType()) {
                    case BOOLEAN:
                        method.invoke(configurable, v.getBoolean());
                        break;
                    case INTEGER:
                        method.invoke(configurable, v.getInteger());
                        break;
                    //....
                }
            } catch (Exception e) {
                throw new TikaConfigParameterException("Exception with 
parameter: " + k +" with class: " +
                        configurable.getClass(), e);
            }
        }
    }
{code}

Then each parser that had configurations wouldn't have to register its 
configurable parameters (strike that suggestion above :) ), but there would be 
an exception at creation time if the {{setN}} method with a correctly typed 
parameter didn't exist.

In short, small bit of code at the outset, but each parser wouldn't then have 
to repeat the {{parseInt}} and handle NumberFormatExceptions, etc.  Each 
configurable parser wouldn't have to worry about configuration at all, except 
to have appropriate setters.



> Add uniformity to parser parameter configuration
> ------------------------------------------------
>
>                 Key: TIKA-1508
>                 URL: https://issues.apache.org/jira/browse/TIKA-1508
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>             Fix For: 1.13
>
>
> We can currently configure parsers by the following means:
> 1) programmatically by direct calls to the parsers or their config objects
> 2) sending in a config object through the ParseContext
> 3) modifying .properties files for specific parsers (e.g. PDFParser)
> Rather than scattering the landscape with .properties files for each parser, 
> it would be great if we could specify parser parameters in the main config 
> file, something along the lines of this:
> {noformat}
>     <parser class="org.apache.tika.parser.audio.AudioParser">
>       <params>
>         <int name="someparam1">2</int>
>         <str name="someOtherParam2">something or other</str>
>       </params>
>       <mime>audio/basic</mime>
>       <mime>audio/x-aiff</mime>
>       <mime>audio/x-wav</mime>
>     </parser>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to