all, I loaded the ModelBuilder-prototype project I mentioned earlier into the sandbox. Please take a look when you get a chance. I have built a few decent model with it already for locations and person entities. The Example class will walk through how it works and you can work it from there. the Impls used are file based impls so you should be able to create a file of sentences, known entities, and a blacklist file to run the examples. A good use case is something like this: I have a corpus of data that can be broken into sentences, I know my data so I can sample some of it to create lists of entities of different types based on random searches (a list of people's names for example). From here the model builder will take the list of sentences, search for all the known entities, if it finds them it annotates the sentence and writes the anno sentences to a file. The file is then used to create a model, the model is used to extract NEs, then the results (if they pass validation) are added to the list of known entities and the loop starts over.... 1: read sentences extract knowns annotate sentences based on knowns build a model from the annotations extract NEs with the model add the Names to the known entities goto 1
let me know what you think MG On Sun, Oct 20, 2013 at 8:59 AM, <[email protected]> wrote: > Author: markg > Date: Sun Oct 20 12:59:13 2013 > New Revision: 1533881 > > URL: http://svn.apache.org/r1533881 > Log: > Prototype of a tool to allow users to create models from of a set of > known entities based on their own data in the form of sentences. > See the Example class in the .v2 package. > > Added: > opennlp/sandbox/modelbuilder-prototype/ > >
