[
https://issues.apache.org/jira/browse/TIKA-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187784#comment-15187784
]
Tim Allison commented on TIKA-1508:
-----------------------------------
bq. Maybe not too complex, but not as a start Just my 2c.
The reason I propose this to start is so that we don't have to worry about
changing our config and backward compatibility ... :)
bq. I think solr way is complex to implement considering that we dont gain much
after the effort (As of now we can just do Integer.parse() or similar ). Plus
it introduces ambiguities with the type expected by parsers and the values
supplied from configuration.
I think we gain quite a bit. The reason I suggested it is tied to 3)...What we
would gain is automatic type checking/verification on loading from the config
file.
If the configurator were something like this:
{code}
public static void configure(Configurable configurable, Map<String,
ParamValue> params)
throws TikaConfigParameterException {
for (String k : params.keySet()) {
//camel case the first character
String setterName =
"set"+k.substring(0,1).toUpperCase(Locale.ENGLISH)+k.substring(1);
try {
Method method =
configurable.getClass().getDeclaredMethod(setterName, Boolean.class);
ParamValue v = params.get(k);
switch (v.getType()) {
case BOOLEAN:
method.invoke(configurable, v.getBoolean());
break;
case INTEGER:
method.invoke(configurable, v.getInteger());
break;
//....
}
} catch (Exception e) {
throw new TikaConfigParameterException("Exception with
parameter: " + k +" with class: " +
configurable.getClass(), e);
}
}
}
{code}
Then each parser that had configurations wouldn't have to register its
configurable parameters (strike that suggestion above :) ), but there would be
an exception at creation time if the {{setN}} method with a correctly typed
parameter didn't exist.
In short, small bit of code at the outset, but each parser wouldn't then have
to repeat the {{parseInt}} and handle NumberFormatExceptions, etc. Each
configurable parser wouldn't have to worry about configuration at all, except
to have appropriate setters.
> Add uniformity to parser parameter configuration
> ------------------------------------------------
>
> Key: TIKA-1508
> URL: https://issues.apache.org/jira/browse/TIKA-1508
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Fix For: 1.13
>
>
> We can currently configure parsers by the following means:
> 1) programmatically by direct calls to the parsers or their config objects
> 2) sending in a config object through the ParseContext
> 3) modifying .properties files for specific parsers (e.g. PDFParser)
> Rather than scattering the landscape with .properties files for each parser,
> it would be great if we could specify parser parameters in the main config
> file, something along the lines of this:
> {noformat}
> <parser class="org.apache.tika.parser.audio.AudioParser">
> <params>
> <int name="someparam1">2</int>
> <str name="someOtherParam2">something or other</str>
> </params>
> <mime>audio/basic</mime>
> <mime>audio/x-aiff</mime>
> <mime>audio/x-wav</mime>
> </parser>
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)