[
https://issues.apache.org/jira/browse/TIKA-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186434#comment-15186434
]
ASF GitHub Bot commented on TIKA-1508:
--------------------------------------
GitHub user thammegowda opened a pull request:
https://github.com/apache/tika/pull/91
TIKA-1508 : Add uniformity to parser parameter configuration - contributed
by Thamme Gowda
1. Added `Configurable` interface.
This can be used for all services like `Parser`, `Detector` which can take
configurable parameters.
2. Added `ConfigurableParser` interface which extends `Parser` interface.
I didn't add new method to existing `Parser` because
that will break the compatibility.
3. `AbstractParser` extends `ConfigurableParser` and has
default implementation for configure() contract.
I think it is safe to do so and it doesn't break anything.
In addition, all parsers which extend `AbstractParser` can easily
access config from TikaConfig if they want to
3. Added a TODO to `TikaConfig`,
after this should allow multiple instances of same parser with
different runtime configurations.
4. `TikaConfig` is modified to detect if instance can be configured,
if so, then checks if params are available in XML file, parses the
params and invokes configure(ctx) method with these params
5. Added `DummyConfigurableParser` that simply copies parameters to
metadata for the sake of testing
6. Added a sample XML config file for testing.
Added `ConfigurableParserTest` that performs an end to end test of all
the above.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/thammegowda/tika TIKA-1508
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/tika/pull/91.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #91
----
commit b2cf23178ede925b0ef23f88ebf1aff95c8c157c
Author: Thamme Gowda <[email protected]>
Date: 2016-03-09T02:23:19Z
Add uniformity to parser parameter configuration.
1. Added Configurable interface.
This can be used for all services like Parser, Detector which can take
configurable parameters.
2. Added ConfigurableParser interface which extends Parser interface.
I didn't add new method to existing Parser because
that will break the compatibility.
3. AbstractParser extends ConfigurableParser and has
default implementation for configure() contract.
I think it is safe to do so and it doesnt break anything.
In addition all parsers which extend AbstractParser will can easily
access config from TikaConfig if they want to
3. Added a TODO to TikaConfig,
after this should allow multiple instances of same parser with
different runtime configurations.
4. TikaConfig is modified to detect if instance can be configured,
if so, then checks if params are available in XML file, parses the
params and invokes configure(ctx) method with these params
5. Added DummyConfigurableParser that simply copies parameters to
metadata for the sake of testing
6. Added a sample XML config file for testing.
Added ConfigurableParserTest that performs an end to end test of all
the above.
commit ae51417d8881dd90b921f02c2677a7d5bfd69a30
Author: Thamme Gowda <[email protected]>
Date: 2016-03-09T03:23:47Z
remove unwanted TODO:
----
> Add uniformity to parser parameter configuration
> ------------------------------------------------
>
> Key: TIKA-1508
> URL: https://issues.apache.org/jira/browse/TIKA-1508
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Fix For: 1.13
>
>
> We can currently configure parsers by the following means:
> 1) programmatically by direct calls to the parsers or their config objects
> 2) sending in a config object through the ParseContext
> 3) modifying .properties files for specific parsers (e.g. PDFParser)
> Rather than scattering the landscape with .properties files for each parser,
> it would be great if we could specify parser parameters in the main config
> file, something along the lines of this:
> {noformat}
> <parser class="org.apache.tika.parser.audio.AudioParser">
> <params>
> <int name="someparam1">2</int>
> <str name="someOtherParam2">something or other</str>
> </params>
> <mime>audio/basic</mime>
> <mime>audio/x-aiff</mime>
> <mime>audio/x-wav</mime>
> </parser>
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)