Hi Cristian and all,

Can I please know the status of this event extraction engine? Event
extraction is a really useful feature for semantic enhancements and I am
interested in collaborating with this work.
Is there any code base you are currently working on for this engine work?

Thanks,
Dileepa

On Tue, Feb 17, 2015 at 9:10 PM, Cristian Petroaca <
cristian.petro...@gmail.com> wrote:

> Hi Edi,
>
> Thanks for the info. Stanford Relation Extractor sounds very interesting.
> I'll give it a try.
>
> 2015-02-17 17:00 GMT+02:00 Edi Bice <edi_b...@yahoo.com.invalid>:
>
> > Hi Cristian,
> > Here are a few more resources on Semantic Role/Relationship Labeling:
> > 1. FrameNet, VerbNet and WordNet on the data side2. Shalmaneser, SEMAFOR
> > and Stanford Relation Extractor on the code side
> > The last one links to a great paper which I believe holds great potential
> > for Stanbol:
> > A Linear Programming Formulation for Global Inference in Natural Language
> > Tasks
> >
> > |   |
> > |   |   |   |   |   |
> > | A Linear Programming Formulation for Global Inference in Natural
> > Language Tasks  Last abstract |Contents |Next abstract A Linear
> Programming
> > Formulation for Global Inference in Natural Language Tasks  |
> > |  |
> > | View on www.cnts.ua.ac.be | Preview by Yahoo |
> > |  |
> > |   |
> >
> >
> >
> > Edi
> >       From: Cristian Petroaca <cristian.petro...@gmail.com>
> >  To: dev@stanbol.apache.org
> >  Sent: Sunday, February 15, 2015 6:34 AM
> >  Subject: Event Extraction Engine
> >
> > Hi All,
> >
> > Quite a while ago I started a discussion on this list about Event
> > Extraction from text. See
> > https://issues.apache.org/jira/browse/STANBOL-1121
> > .
> >
> > I'd like to get started on the actual work and I have been thinking how
> to
> > best approach this and there are some things that I would do differently
> > than what the JIRA describes.I'd like to get your feedback on it.
> >
> > Basically the main approach would be:
> >
> > 1. Detect all NERs and their co-references.
> >
> > 2. Apply semantic role labeling on the sentences where the above
> mentioned
> > NERs reside.
> > I found some interesting Semantic Role labeling libraries such as
> > https://code.google.com/p/mate-tools/ or
> > http://cogcomp.cs.illinois.edu/page/software_view/SRL.
> > With this I'll be able to detect the Agent, the Verb (action) and the
> > Patient and Instruments.
> >
> > This could be a minimal implementation of the engine. After that I can
> > simply create the event data model as described in the JIRA and annotate
> > the text.
> > But this does not actually detect what kind of event it is or what are
> the
> > event specific roles that the entities have in the relation.
> >
> > For example we can have the sentence "Google buys Yahoo for $100
> million".
> > There are a lot more to be said about this sentence than simply that
> > "Google" is the agent and "Yahoo" is the Patient. This is actually an
> > acquisition event and "Google" is the buyer and "Yahoo" the bought
> entity.
> > We also would need to align to a common ontology synonym phrases such as
> > "buy" or "acquire" so that we know that both refer to the same
> Acquisition
> > event.
> >
> > Having said that, we would add a new step :
> > 3. Try to detect event type and event details.
> >
> > This can be done by either:
> >
> > 3.1 Rule based : hand written rules which would map a certain sentence
> > structure, such as the name of the verb and the type of entities as
> agent,
> > patient to a certain event type.
> > This has the benefit of being easy to build but quite inflexible.
> >
> > 3.2 Statistical based: train a model which would be able to classify an
> > event type based on the features of the sentence such as verb type,
> entity
> > type, role type, etc.. This is the approach described here :
> > http://web.stanford.edu/~jurafsky/mintz.pdf.
> > This would be quite hard to build but quite flexible.
> >
> > This 3rd step of detecting event types & details I think would be most
> > efficient for domain specific events. We would have configs with several
> > models for several domains available and the user could with use one of
> the
> > pre-existent models or create a new one.
> >
> > I don't have any practical experience with training models or text
> > classification based on features (but I've been doing a lot of reading on
> > it) so I'm not sure exactly how feasible what I said at point no 3
> actually
> > is.
> >
> > Regards,
> > Cristian
> >
> >
> >
> >
>

Reply via email to