[ 
https://issues.apache.org/jira/browse/TIKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304861#comment-15304861
 ] 

Thamme Gowda commented on TIKA-1986:
------------------------------------

Thanks for the review and feedback. [[email protected]]

bq. Would it be possible to add serialization of the parameters to 
TikaConfigSerializer? I may have missed this. Not crucial for the initial 
patch, but we'll want to add this.
Yes, The functionality is  [right here in 
save(...)|https://github.com/apache/tika/blob/01869923533b330ec7728995e3ee5feceee1b90e/tika-core/src/main/java/org/apache/tika/config/Param.java#L175].
 However, that needs to be integrated TikaConfigSerializer.
I was thinking of updating the entire Configuration <-> XML (de)serialization 
using JAXB. I found that the deserialization and service loading are bundled 
together, so I could proceed without separating them

bq. Going forward (Tika 2.0), how do we want the parsers to interact with the 
configuration? Should they interact directly with the params, or should they 
initialize their current param variables with the params?

I really really liked your suggestion of invoking the setter methods to 
directly initialize. I +1 for having that integrated in tika 2.0. However, 
let's have the same functionality in bit more cleaner way by using annotations 
instead of crude reflection function calls. 
 Lets make a contract - if there is a @Param annotation for attributes in 
parser objects, we shall initialize them. what do you say?

bq. What's the benefit of configuring with a ParseContext instead of a 
Map<String, Param<?>>? Along the same lines, configure doesn't actually 
configure, it just sets the Map... Should we rename it as a setter? Or should 
we make it do something?

1. Right! The base class method doesn't configure. Child parsers should 
override that method to configure themselves, tika-core supplies the required 
parameters to configure. 
2. I think using a wrapper object like ParseContext allows us to make 
improvements later without breaking.  using Map<String, Param<?>> is too 
limiting for future enhancements. I have a feeling that the parsers are going 
to be more complex, and they require many other resources to initialize.

bq. IIRC AbstractParser is only there as syntactic sugar to gloss over the 
newer requirement to pass in a ParseContext at parse time. In 2.0, 
AbstractParser will go away...I think. So, might be better to make 
ConfigurableParser an abstract class that handles all of the functionality 
instead of an interface
I missed Tika 2.0 design discussions. +1 for the suggestion.

bq. Not crucial for the initial patch, but it would be great if we could add 
error checking/automatic configuration (perhaps via reflection) at the level of 
the ConfigurableParser so that each parser (configurable?) doesn't have to set 
their own params.

+1, lets define @Param annotation and take it forward. 


> support parser parameters with type (int, double, etc) in configuration XML 
> file
> --------------------------------------------------------------------------------
>
>                 Key: TIKA-1986
>                 URL: https://issues.apache.org/jira/browse/TIKA-1986
>             Project: Tika
>          Issue Type: Sub-task
>          Components: config
>            Reporter: Thamme Gowda
>             Fix For: 1.14
>
>
> Tika Configuration should be enhanced to support for basic types like int, 
> double, boolean, url, file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to