We actually can train a CRF from Mallet with the existing infrastructure, and the code should still work (maybe there are minor issues, who knows). I tried that but just couldn't get better results. We should maybe try to get this code (mallet-addon) into a good shape again and then see what the issues could be. There might be some folks out there who want to do this for research or other cases, I believe one of my issues were bad hyper params.
With perceptron you can get good results, quickly out of the box. Jörn On Tue, Feb 7, 2017 at 3:59 PM, Russ, Daniel (NIH/CIT) [E] < dr...@mail.nih.gov> wrote: > It would be interesting to compare the results of OpenNLP’s perceptron > trained models, GIS trained models, and a vanilla CRF implementation (i.e. > not specifically trained for a task). We can make a better decision on if > we should spend the effort to implement a CRF. Every once in a while we > see people ask “what can I do?”. Maybe the answer should be… given an > ObjectStream<Event> or DataIndexer, train a CRFModel that extends > AbstractModel. Your training class must extend AbstractEventTrainer and we > serializable using AbstractModelWriter > > Just my 2 cents. > Daniel > > On 2/7/17, 9:51 AM, "Damiano Porta" <damianopo...@gmail.com> wrote: > > I have good results with perceptron, but +1 for CRF > > 2017-02-07 15:42 GMT+01:00 Russ, Daniel (NIH/CIT) [E] < > dr...@mail.nih.gov>: > > > Hi Jörn, > > > > > > > > I think the best entity recognition systems use CRF’s. At some > point > > we might want to consider adding them. As you know, ME classifiers > suffer > > from label bias problem (see Lafferty et. al<http://repository.upenn > . > > edu/cgi/viewcontent.cgi?article=1162&context=cis_papers>.) CRF’s > deal > > with that issue. I believe that perceptrons suffer from the same > problem. > > If you think the results are better, I have no problem. I think > that our > > long-term goal should be to add a CRF, and make it the default for > the > > NameFinder. > > > > > > > > Daniel > > > > > > > > > > > > On 2/6/17, 12:40 PM, "Joern Kottmann" <kottm...@gmail.com> wrote: > > > > > > > > Hello all, > > > > > > > > I would like to propose to switch the default training algorithm > from > > > > maxent gis to perceptron for the Name Finder. In all the data > sets I > > > > tried perceptron performs better than maxent gis and I believe > that > > > > would be a much more sensible default. > > > > > > > > A user can always override the default by providing the > algorithms > > > > parameter for training. > > > > > > > > What do you think? > > > > > > > > Jörn > > > > > > > > > > > > >