Jörn,

Great, that helps quite a lot! It is similar with the main method of that
class but your explanations and the trick of integrating the NERs are the
real silver bullet of your post here. Thanks a bunch.

I will play with that at night.

What I want is to define a few IE algorithms from that.

Is there any references of Information Extraction using OpenNLP that you
would recommend as well?

Thanks,

Carlos.

On Thu, Jun 14, 2012 at 9:14 AM, Jörn Kottmann <[email protected]> wrote:

> Hello,
>
> the input for the coreference component needs to be preprocessed,
> with the sentence detector, tokenizer, parser and name finders.
>
> You can do this via API and our documentation provides sample code for
> each of these steps.
>
> The only tricky part is the to get the named entities into the parse tree.
> Here is a sample:
> Parse parse; // returned from parser
> Span personEntites[];  // returned from person name model
> ....
> Parse.addNames("person", personEntites[fi], parse.getTagNodes());
>
> After this the person names are inserted into the parse tree, you need
> to repeat this step for every entity type you would like to reference. The
> "person"
> tags are currently hard coded. You can find a list in
> TreebankNameFinder.NAME_TYPES
> (I believe thats a trunk only class).
>
> Before you start with the rest you should download all the coreferencer
> models for 1.4
> into one directory, similar to the structure on the sever.
>
> Now we are coming to the coreference resolution code:
> Linker treebankLinker = new TreebankLinker("/home/joern/**corefmodel/",
> LinkerMode.TEST);
>
> This will create the linker for you.
>
> First all the mentions need to be recognized and afterward they are linked
> together.
> For every sentence you do this:
> Parse p = ...; // contains a parse of a sentence with names
> Mention[] extents = treebankLinker.**getMentionFinder().**getMentions(new
> DefaultParse(p,sentenceNumber)**);
> for (int ei=0,en=extents.length;ei<en;**ei++) {
>  if (extents[ei].getParse() == null) {
>    Parse snp = new Parse(p.getText(),extents[ei].**getSpan(),"NML",1.0,0);
>    p.insert(snp);
>    extents[ei].setParse(new DefaultParse(snp, sentenceNumber));
>  }
> }
> sentenceNumber++;
>
> The result are the mentions per sentence. All these mention objects should
> be copied into a single list,
> e.g. via document.addAll(extents) (document is of type List<Mention>).
>
> Now the mentions of one document can be linked together:
> DiscourseEntity[] entities = treebankLinker.getEntities(**document.toArray(new
> Mention[document.size()]));
>
> The entities array now contains the various detected and linked entities,
> usually you want to filter out entities
> which just have a single mention. The DiscourseEntity groups mentions
> together, a mention must not be an
> entity, other noun phrases are valid mentions as well.
>
> Hope that helps,
> Jörn
>
>
>
> On 06/13/2012 07:41 PM, Carlos Scheidecker wrote:
>
>> Jörn,
>>
>> I just want to know how it works for now. I've following the one from
>> StanfordNLP as well.
>>
>> Basically, I want to first know if I just pass raw test to it or if I have
>> to tag that first. Looks like I need to do POS tag first.
>>
>> I want to be able to pass a text and get the references as object lists
>> from the API.
>>
>> So I can fetch the relations.
>>
>> I still need to take some time here and read more the source code unless
>> you have some pointers.
>>
>> Thanks,
>>
>> Carlos.
>>
>>
>>
>> On Wed, Jun 13, 2012 at 11:23 AM, Jörn Kottmann<[email protected]>
>>  wrote:
>>
>>  On 06/13/2012 07:07 PM, Carlos Scheidecker wrote:
>>>
>>>  Thanks. So for now we can only use the models from 1.4. I saw that a
>>>> training class was added recently. How do you use that?
>>>>
>>>>  Thats still work in progress, on which data do you want to train?
>>>
>>> You need to produce data in a certain format, there should be a sample in
>>> the test folder.
>>> Its basically penn treebank style plus some nodes to label the mentions
>>> in the tree.
>>>
>>> The parse trees of a document are grouped and send document wise
>>> to the trainer via a stream. After this is done a new model will be
>>> trained.
>>>
>>> The OpenNLP corferencer works currently only on noun phrases, other
>>> mentions
>>> like verbs will not be resolved (in case you wanna train on OntoNotes).
>>>
>>> Jörn
>>>
>>>
>>>
>>>
>

Reply via email to