[ 
https://issues.apache.org/jira/browse/OPENNLP-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293972#comment-16293972
 ] 

Koji Sekiguchi edited comment on OPENNLP-1154 at 12/17/17 12:18 AM:
--------------------------------------------------------------------

What I did in this patch are:

* move all static *FeatureGeneratorFactory classes out of GeneratorFactory.java 
and make them individual Factory classes such as 
BrownClusterTokenFeatureGeneratorFactory.java, 
BigramNameFeatureGeneratorFactory.java etc. so that users can avoid specifying 
nested class names e.g. 
opennlp.tools.util.featuregen.GeneratorFactory.BigramNameFeatureGeneratorFactory
 in XML config file

* provide AbstractXmlFeatureGeneratorFactory class which all 
*FeatureGeneratorFactory classes must extend. It has init() method that is 
called from framework when XML config file is read. It helps 
*FeatureGeneratorFactory classes to set their parameters if they are specified 
in the nested way like:

{code:xml}
       <generator 
class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory">
          <int name="prevLength">2</int>
          <int name="nextLength">2</int>
          <generator 
class="opennlp.tools.util.featuregen.TokenClassFeatureGeneratorFactory"/>
       </generator>
{code}

* *FeatureGeneratorFactory classes can read parameters set in XML config file 
via getter methods e.g. getInt(“parameter name”), getStr(“parameter name”) as 
long as they extend AbstractXmlFeatureGeneratorFactory class. 
AbstractXmlFeatureGeneratorFactory set parameters to 
LinkedHashMap<String,Object> in init() method. Why I used LinkedHashMap not 
HashMap because it must respect the order of written parameters, because 
multiple <generator …/> can be specified in a parent FeatureGeneratorFactory, 
only AggregatedFeatureGeneratorFactory can support multiple sub-generators now 
though.

* classic format is still supported for back-compat reasons. I provided test 
cases to check both of classic and new formats support. The classic format XML 
files can be found with *_classic.xml file name under src/test/resources 
folder. GeneratorFactory recognizes which format is used in createGenerator() 
method.

* extractArtifactSerializerMappings() method can support both classic and new 
formats.



was (Author: koji):
What I did in this patch are:

* move all static *FeatureGeneratorFactory classes out of GeneratorFactory.java 
and make them individual Factory classes such as 
BrownClusterTokenFeatureGeneratorFactory.java, 
BigramNameFeatureGeneratorFactory.java etc. so that users can avoid specifying 
nested class names e.g. 
opennlp.tools.util.featuregen.GeneratorFactory.BigramNameFeatureGeneratorFactory
 in XML config file

* provide AbstractXmlFeatureGeneratorFactory class which all 
*FeatureGeneratorFactory classes must extend. It has init() method that is 
called from framework when XML config file is read. It helps 
*FeatureGeneratorFactory classes to set their parameters if they are specified 
in the nested way like:

{code:xml}
       <generator 
class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory">
          <int name="prevLength">2</int>
          <int name="nextLength">2</int>
          <generator 
class="opennlp.tools.util.featuregen.TokenClassFeatureGeneratorFactory"/>
       </generator>
{code}

* *FeatureGeneratorFactory classes can read parameters set in XML config file 
via getter methods e.g. getInt(“parameter name”), getStr(“parameter name”) as 
long as they extend AbstractXmlFeatureGeneratorFactory class. 
AbstractXmlFeatureGeneratorFactory set parameters to 
LinkedHashMap<String,Object> in init() method. Why I used LinkedHashMap not 
HashMap because it must respect the order of written parameters, because 
multiple <generator …/> can be specified in a parent FeatureGeneratorFactory, 
only AggregatedFeatureGeneratorFactory can support multiple sub-generators now 
though.

* classic format is still supported for back-compat reasons. I provided test 
cases to check both of classic and new formats support. The classic format XML 
files can be found with *_classic.xml file name under src/test/resources 
folder. GeneratorFactory recognizes which format is used in createGenerator() 
method.

* extractArtifactSerializerMappings() method can support both classic and new 
formats.
* 

> change the XML format for feature generator config in NameFinder and POS 
> Tagger
> -------------------------------------------------------------------------------
>
>                 Key: OPENNLP-1154
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1154
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Name Finder
>    Affects Versions: 1.8.3
>            Reporter: Koji Sekiguchi
>            Assignee: Koji Sekiguchi
>
> NameFinder provides many kinds of feature generator (factories). Users can 
> define their config via XML which looks like:
> {code:xml}
> <generators>
>   <cache> 
>     <generators>
>       <window prevLength = "2" nextLength = "2">          
>         <tokenclass/>
>       </window>
>       <window prevLength = "2" nextLength = "2">                
>         <token/>
>       </window>
>       <definition/>
>       <prevmap/>
>       <bigram/>
>       <sentence begin="true" end="false"/>
>     </generators>
>   </cache> 
> </generators>
> {code}
> If a user wants to implement their own feature generator, he can use <custom 
> .../>, but if he wants to have two or more feature generators at once, he may 
> be able to implement it by providing a wrapper feature generator which wraps 
> two or more feature generators that he originally wants to have, but it is 
> not good.
> I'd like to suggest that we make the config format more flexible like below:
> {code:xml}
> <generator 
> class="opennlp.tools.util.featuregen.AggregatedFeatureGeneratorFactory">
>   <args>
>     <generator 
> class="opennlp.tools.util.featuregen.CachedFeatureGeneratorFactory">
>       <args>
>         <generator 
> class="opennlp.tools.util.featuregen.AggregatedFeatureGeneratorFactory">
>           <args>
>             <generator 
> class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory">
>               <args>
>                 <int name="prevLength">2</int>
>                 <int name="nextLength">2</int>
>                 <generator 
> class="opennlp.tools.util.featuregen.TokenClassFeatureGeneratorFactory"/>
>               </args>
>             </generator>
>             <generator 
> class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory">
>               <args>
>                 <int name="prevLength">2</int>
>                 <int name="nextLength">2</int>
>                 <generator 
> class="opennlp.tools.util.featuregen.TokenFeatureGeneratorFactory"/>
>               </args>
>             </generator>
>           </args>
>         </generator>
>       </args>
>     </generator>
>   </args>
> </generator>
> {code}
> If <args>...</args> is too noisy, I'm thinking another format as well:
> {code:xml}
> <generator 
> class="opennlp.tools.util.featuregen.AggregatedFeatureGeneratorFactory">
>   <generator 
> class="opennlp.tools.util.featuregen.CachedFeatureGeneratorFactory">
>     <generator 
> class="opennlp.tools.util.featuregen.AggregatedFeatureGeneratorFactory">
>       <generator 
> class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory">
>         <int name="prevLength">2</int>
>         <int name="nextLength">2</int>
>         <generator 
> class="opennlp.tools.util.featuregen.TokenClassFeatureGeneratorFactory"/>
>       </generator>
>       <generator 
> class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory">
>         <int name="prevLength">2</int>
>         <int name="nextLength">2</int>
>         <generator 
> class="opennlp.tools.util.featuregen.TokenFeatureGeneratorFactory"/>
>       </generator>
>     </generator>
>   </generator>
> </generator>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to