[ 
https://issues.apache.org/jira/browse/OPENNLP-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14182635#comment-14182635
 ] 

Rodrigo Agerri commented on OPENNLP-725:
----------------------------------------

Hi, 

OK, this is the first draft implementation. What the loadResources() method in 
TokenNameFinderTrainerTool does now is: 

1. Reads the xml descriptor trying to find custom generator resources (as 
before)
2. Reads the xml descriptor to get all the feature elements (I have put this 
function in GeneratorFactory, but this is debatable, of course)
3. Iterates over all the files in the -resource directory and over the elements 
found in previous point 2: 
  + If an element contains an attribute "dict" and that attribute value equals 
to the fileName, then the serializer key is the element tag name. 

Now every serializer created in the 
TokeNameFinderModel.createArtifactSerializers() needs to be given as key its 
element tag name as key (which is also the same as in the factoryMap in 
GeneratorFactory. 

Both hardcoded aspects "dict" and tag name are hardcoded already in the 
GeneratorFactory. No extra hardcoded aspects have been added. I need to test it 
when different files are mapped to different feature generators requiring 
different serializers, but it seems fine. 

As an aside comment to what you mentioned about removing 
createArtifactSerializers method from TokenNameFinderModel: that will be great. 
Perhaps implementing a similar method to the serializers for the custom feature 
generators?  





> TokenNameFinderTrainer CLI not loading resources
> ------------------------------------------------
>
>                 Key: OPENNLP-725
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-725
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Name Finder
>    Affects Versions: 1.6.0
>            Reporter: Rodrigo Agerri
>            Assignee: Rodrigo Agerri
>             Fix For: 1.6.0
>
>
> Passing an XML featuregen descriptor to the CLI TokenNameFinderTrainer with a 
> line such as 
> <w2vwordcluster dict="word2vec-test.txt" />
>  
> and with the -resource parameter properly set, the loadResources() method 
> does not  get the right serializer to create the resource (line 130 of 
> TokenNameFinderTrainerTool class). It looks in the ArtifactSerializers map 
> created at the beginning of the method but does not find a value for the key 
> (which is the file extension of the lexicon?). 
> Proposed solution: get the appropriate serializer from the element class 
> (e.g. w2vwordcluster). 
> Any comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to