[
https://issues.apache.org/jira/browse/OPENNLP-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293972#comment-16293972
]
Koji Sekiguchi edited comment on OPENNLP-1154 at 12/17/17 12:18 AM:
--------------------------------------------------------------------
What I did in this patch are:
* move all static *FeatureGeneratorFactory classes out of GeneratorFactory.java
and make them individual Factory classes such as
BrownClusterTokenFeatureGeneratorFactory.java,
BigramNameFeatureGeneratorFactory.java etc. so that users can avoid specifying
nested class names e.g.
opennlp.tools.util.featuregen.GeneratorFactory.BigramNameFeatureGeneratorFactory
in XML config file
* provide AbstractXmlFeatureGeneratorFactory class which all
*FeatureGeneratorFactory classes must extend. It has init() method that is
called from framework when XML config file is read. It helps
*FeatureGeneratorFactory classes to set their parameters if they are specified
in the nested way like:
{code:xml}
<generator
class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory">
<int name="prevLength">2</int>
<int name="nextLength">2</int>
<generator
class="opennlp.tools.util.featuregen.TokenClassFeatureGeneratorFactory"/>
</generator>
{code}
* *FeatureGeneratorFactory classes can read parameters set in XML config file
via getter methods e.g. getInt(“parameter name”), getStr(“parameter name”) as
long as they extend AbstractXmlFeatureGeneratorFactory class.
AbstractXmlFeatureGeneratorFactory set parameters to
LinkedHashMap<String,Object> in init() method. Why I used LinkedHashMap not
HashMap because it must respect the order of written parameters, because
multiple <generator …/> can be specified in a parent FeatureGeneratorFactory,
only AggregatedFeatureGeneratorFactory can support multiple sub-generators now
though.
* classic format is still supported for back-compat reasons. I provided test
cases to check both of classic and new formats support. The classic format XML
files can be found with *_classic.xml file name under src/test/resources
folder. GeneratorFactory recognizes which format is used in createGenerator()
method.
* extractArtifactSerializerMappings() method can support both classic and new
formats.
was (Author: koji):
What I did in this patch are:
* move all static *FeatureGeneratorFactory classes out of GeneratorFactory.java
and make them individual Factory classes such as
BrownClusterTokenFeatureGeneratorFactory.java,
BigramNameFeatureGeneratorFactory.java etc. so that users can avoid specifying
nested class names e.g.
opennlp.tools.util.featuregen.GeneratorFactory.BigramNameFeatureGeneratorFactory
in XML config file
* provide AbstractXmlFeatureGeneratorFactory class which all
*FeatureGeneratorFactory classes must extend. It has init() method that is
called from framework when XML config file is read. It helps
*FeatureGeneratorFactory classes to set their parameters if they are specified
in the nested way like:
{code:xml}
<generator
class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory">
<int name="prevLength">2</int>
<int name="nextLength">2</int>
<generator
class="opennlp.tools.util.featuregen.TokenClassFeatureGeneratorFactory"/>
</generator>
{code}
* *FeatureGeneratorFactory classes can read parameters set in XML config file
via getter methods e.g. getInt(“parameter name”), getStr(“parameter name”) as
long as they extend AbstractXmlFeatureGeneratorFactory class.
AbstractXmlFeatureGeneratorFactory set parameters to
LinkedHashMap<String,Object> in init() method. Why I used LinkedHashMap not
HashMap because it must respect the order of written parameters, because
multiple <generator …/> can be specified in a parent FeatureGeneratorFactory,
only AggregatedFeatureGeneratorFactory can support multiple sub-generators now
though.
* classic format is still supported for back-compat reasons. I provided test
cases to check both of classic and new formats support. The classic format XML
files can be found with *_classic.xml file name under src/test/resources
folder. GeneratorFactory recognizes which format is used in createGenerator()
method.
* extractArtifactSerializerMappings() method can support both classic and new
formats.
*
> change the XML format for feature generator config in NameFinder and POS
> Tagger
> -------------------------------------------------------------------------------
>
> Key: OPENNLP-1154
> URL: https://issues.apache.org/jira/browse/OPENNLP-1154
> Project: OpenNLP
> Issue Type: Improvement
> Components: Name Finder
> Affects Versions: 1.8.3
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
>
> NameFinder provides many kinds of feature generator (factories). Users can
> define their config via XML which looks like:
> {code:xml}
> <generators>
> <cache>
> <generators>
> <window prevLength = "2" nextLength = "2">
> <tokenclass/>
> </window>
> <window prevLength = "2" nextLength = "2">
> <token/>
> </window>
> <definition/>
> <prevmap/>
> <bigram/>
> <sentence begin="true" end="false"/>
> </generators>
> </cache>
> </generators>
> {code}
> If a user wants to implement their own feature generator, he can use <custom
> .../>, but if he wants to have two or more feature generators at once, he may
> be able to implement it by providing a wrapper feature generator which wraps
> two or more feature generators that he originally wants to have, but it is
> not good.
> I'd like to suggest that we make the config format more flexible like below:
> {code:xml}
> <generator
> class="opennlp.tools.util.featuregen.AggregatedFeatureGeneratorFactory">
> <args>
> <generator
> class="opennlp.tools.util.featuregen.CachedFeatureGeneratorFactory">
> <args>
> <generator
> class="opennlp.tools.util.featuregen.AggregatedFeatureGeneratorFactory">
> <args>
> <generator
> class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory">
> <args>
> <int name="prevLength">2</int>
> <int name="nextLength">2</int>
> <generator
> class="opennlp.tools.util.featuregen.TokenClassFeatureGeneratorFactory"/>
> </args>
> </generator>
> <generator
> class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory">
> <args>
> <int name="prevLength">2</int>
> <int name="nextLength">2</int>
> <generator
> class="opennlp.tools.util.featuregen.TokenFeatureGeneratorFactory"/>
> </args>
> </generator>
> </args>
> </generator>
> </args>
> </generator>
> </args>
> </generator>
> {code}
> If <args>...</args> is too noisy, I'm thinking another format as well:
> {code:xml}
> <generator
> class="opennlp.tools.util.featuregen.AggregatedFeatureGeneratorFactory">
> <generator
> class="opennlp.tools.util.featuregen.CachedFeatureGeneratorFactory">
> <generator
> class="opennlp.tools.util.featuregen.AggregatedFeatureGeneratorFactory">
> <generator
> class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory">
> <int name="prevLength">2</int>
> <int name="nextLength">2</int>
> <generator
> class="opennlp.tools.util.featuregen.TokenClassFeatureGeneratorFactory"/>
> </generator>
> <generator
> class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory">
> <int name="prevLength">2</int>
> <int name="nextLength">2</int>
> <generator
> class="opennlp.tools.util.featuregen.TokenFeatureGeneratorFactory"/>
> </generator>
> </generator>
> </generator>
> </generator>
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)