Rupert Westenthaler created STANBOL-1231:
--------------------------------------------

             Summary: Add French Treebank+ Tagset to OpenNLP poss tagging engine
                 Key: STANBOL-1231
                 URL: https://issues.apache.org/jira/browse/STANBOL-1231
             Project: Stanbol
          Issue Type: New Feature
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


Nicolas Hernandez has made OpenNLP models for french available at [1]. This 
models seam to use a Tagset published by "Crabb ́e & Candito, 2008" and best 
described on page 8 of [2]. Information on the main categories can be found at 
[3].

To use this models with the OpenNLP Pos Tagging Engine the PosTagSetRegistty 
should be extended with TagSet mapping for this Tagset.

NOTE: Users that want to use those models will need to download them from [1], 
extract the archive. Rename the files to fr-sent.bin, fr-token.bin, 
fr-pos-maxent.bin, fr-chunker.bin and copy those files to the stanbol datafiles 
directory (by default under "stanbol/datafiles").





[1] 
http://enicolashernandez.blogspot.co.at/2012/12/apache-opennlp-fr-models.html
[2] 
http://alpage.inria.fr/statgram/frdep/Publications/crabbecandi-taln2008-final.pdf
[3] http://www.llf.cnrs.fr/Gens/Abeille/French-Treebank-en.php



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to