[
https://issues.apache.org/jira/browse/TIKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304861#comment-15304861
]
Thamme Gowda commented on TIKA-1986:
------------------------------------
Thanks for the review and feedback. [[email protected]]
bq. Would it be possible to add serialization of the parameters to
TikaConfigSerializer? I may have missed this. Not crucial for the initial
patch, but we'll want to add this.
Yes, The functionality is [right here in
save(...)|https://github.com/apache/tika/blob/01869923533b330ec7728995e3ee5feceee1b90e/tika-core/src/main/java/org/apache/tika/config/Param.java#L175].
However, that needs to be integrated TikaConfigSerializer.
I was thinking of updating the entire Configuration <-> XML (de)serialization
using JAXB. I found that the deserialization and service loading are bundled
together, so I could proceed without separating them
bq. Going forward (Tika 2.0), how do we want the parsers to interact with the
configuration? Should they interact directly with the params, or should they
initialize their current param variables with the params?
I really really liked your suggestion of invoking the setter methods to
directly initialize. I +1 for having that integrated in tika 2.0. However,
let's have the same functionality in bit more cleaner way by using annotations
instead of crude reflection function calls.
Lets make a contract - if there is a @Param annotation for attributes in
parser objects, we shall initialize them. what do you say?
bq. What's the benefit of configuring with a ParseContext instead of a
Map<String, Param<?>>? Along the same lines, configure doesn't actually
configure, it just sets the Map... Should we rename it as a setter? Or should
we make it do something?
1. Right! The base class method doesn't configure. Child parsers should
override that method to configure themselves, tika-core supplies the required
parameters to configure.
2. I think using a wrapper object like ParseContext allows us to make
improvements later without breaking. using Map<String, Param<?>> is too
limiting for future enhancements. I have a feeling that the parsers are going
to be more complex, and they require many other resources to initialize.
bq. IIRC AbstractParser is only there as syntactic sugar to gloss over the
newer requirement to pass in a ParseContext at parse time. In 2.0,
AbstractParser will go away...I think. So, might be better to make
ConfigurableParser an abstract class that handles all of the functionality
instead of an interface
I missed Tika 2.0 design discussions. +1 for the suggestion.
bq. Not crucial for the initial patch, but it would be great if we could add
error checking/automatic configuration (perhaps via reflection) at the level of
the ConfigurableParser so that each parser (configurable?) doesn't have to set
their own params.
+1, lets define @Param annotation and take it forward.
> support parser parameters with type (int, double, etc) in configuration XML
> file
> --------------------------------------------------------------------------------
>
> Key: TIKA-1986
> URL: https://issues.apache.org/jira/browse/TIKA-1986
> Project: Tika
> Issue Type: Sub-task
> Components: config
> Reporter: Thamme Gowda
> Fix For: 1.14
>
>
> Tika Configuration should be enhanced to support for basic types like int,
> double, boolean, url, file.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)