[
https://issues.apache.org/jira/browse/OPENNLP-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180631#comment-14180631
]
Joern Kottmann commented on OPENNLP-725:
----------------------------------------
Originally the serializers where always mapped by resource file extensions.
That design wasn't extensible because the mapping was static. To fix that we
introduced the concept that the feature generator has to specify which
serializers should be used on a per resource basis. That helps to load the
artifacts from the resource folder initially. After they are loaded the
SerializableArtifact interface is used to write them to the model. The model
also writes which serializer classes have to be used to load them again.
The new way of handling the resources - by having them implement
SerializableArtifact - is easier to use than the approach we used initially. It
is probably a good idea to adapt all artifacts we have to that.
Looks like the resource for that dict is loaded by extension. So changing it to
.w2vclasses will probably make it work.
Anyway, +1 to update it to use the new logic which doesn't depend on the file
ending. I think that is more natural to use anyway.
The new approach should also be able to output a clearer error message. The
resource file is either missing from the resource directory or it is not valid.
In both cases a proper error can be given to the user.
I think now it just says it couldn't map a resource which doesn't really point
a user int the right direction to solve the problem.
> TokenNameFinderTrainer CLI not loading resources
> ------------------------------------------------
>
> Key: OPENNLP-725
> URL: https://issues.apache.org/jira/browse/OPENNLP-725
> Project: OpenNLP
> Issue Type: Bug
> Components: Name Finder
> Affects Versions: 1.6.0
> Reporter: Rodrigo Agerri
> Assignee: Rodrigo Agerri
> Fix For: 1.6.0
>
>
> Passing an XML featuregen descriptor to the CLI TokenNameFinderTrainer with a
> line such as
> <w2vwordcluster dict="word2vec-test.txt" />
>
> and with the -resource parameter properly set, the loadResources() method
> does not get the right serializer to create the resource (line 130 of
> TokenNameFinderTrainerTool class). It looks in the ArtifactSerializers map
> created at the beginning of the method but does not find a value for the key
> (which is the file extension of the lexicon?).
> Proposed solution: get the appropriate serializer from the element class
> (e.g. w2vwordcluster).
> Any comments?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)