Re: Event Extraction Engine

Cristian Petroaca Sun, 20 Sep 2015 06:15:07 -0700

Hi Dileepa,

I've been thinking more about the approach using a Word Sense
Disambiguation tool to classify the verb in the sentence and I think it may
be a good approach. The verb seems to be the event trigger and once you
know its actual meaning (by applying a Wordnet class or some other DB used
for WSD) then I think it's quite straightforward to identify the actors in
the event (agent, patient, instrument, etc) by applying some user defined
rules for that verb class.


For example if you have the verb "attack" which can have multiple meanings
depending on the context you will disambiguate it using wordnet like this:
On Wed, Sep 9, 2015 at 8:33 PM, Dileepa Jayakody <dileepajayak...@gmail.com>
wrote:

> Hi Cristian,
>
> Interesting ideas. Let me do some background reading on this, so I can also
> participate in the discussion better.
>
> Thanks,
> Dileepa
>
> On Wed, Sep 9, 2015 at 3:17 PM, Cristian Petroaca <
> cristian.petro...@gmail.com> wrote:
>
> > Another approach to this would be to use a semantic role labeling tool
> [1]
> > to determine the type of relation between the subject and object.
> >
> > Or we could use Word Sense Disambiguation to determine the wordnet class
> of
> > the verb (this way we have a standard relation definition) and based on
> > what relation type it is we can search for the subject and object using
> > dependency tree parsing in Stanford NLP.
> >
> > These 2 options ensure that we can have a much bigger recall but I'm not
> > sure about the precision...
> >
> > So I think we'll need to first settle on the method of implementing this
> > engine before starting anything.
> >
> > [1] http://cogcomp.cs.illinois.edu/page/demo_view/srl
> >
> > On Tue, Sep 8, 2015 at 11:45 AM, Cristian Petroaca <
> > cristian.petro...@gmail.com> wrote:
> >
> > > Hi Dileepa,
> > >
> > > Unfortunately I did not have the time to work on this at all so there
> is
> > > no code base . But I'd be happy to start contributing with something to
> > > this engine and I think it would also be very helpful if you will be
> able
> > > to contribute to this as well.
> > > I did get a chance to test the Stanford relation extractor which works
> > > fine but it's quite limited to a handful of relation types (live_in,
> > > located_in, org_based_in, work_for). So we would need to train other
> > models
> > > if we want to increase the relation type number.
> > > I also think that the Event Extraction Engine should work in
> conjunction
> > > with any coreference and comention engines we have to increase the
> > relation
> > > count.
> > >
> > > Regards,
> > > Cristian
> > >
> > > On Tue, Sep 8, 2015 at 11:19 AM, Dileepa Jayakody <
> > > dileepajayak...@gmail.com> wrote:
> > >
> > >> Hi Cristian and all,
> > >>
> > >> Can I please know the status of this event extraction engine? Event
> > >> extraction is a really useful feature for semantic enhancements and I
> am
> > >> interested in collaborating with this work.
> > >> Is there any code base you are currently working on for this engine
> > work?
> > >>
> > >> Thanks,
> > >> Dileepa
> > >>
> > >> On Tue, Feb 17, 2015 at 9:10 PM, Cristian Petroaca <
> > >> cristian.petro...@gmail.com> wrote:
> > >>
> > >> > Hi Edi,
> > >> >
> > >> > Thanks for the info. Stanford Relation Extractor sounds very
> > >> interesting.
> > >> > I'll give it a try.
> > >> >
> > >> > 2015-02-17 17:00 GMT+02:00 Edi Bice <edi_b...@yahoo.com.invalid>:
> > >> >
> > >> > > Hi Cristian,
> > >> > > Here are a few more resources on Semantic Role/Relationship
> > Labeling:
> > >> > > 1. FrameNet, VerbNet and WordNet on the data side2. Shalmaneser,
> > >> SEMAFOR
> > >> > > and Stanford Relation Extractor on the code side
> > >> > > The last one links to a great paper which I believe holds great
> > >> potential
> > >> > > for Stanbol:
> > >> > > A Linear Programming Formulation for Global Inference in Natural
> > >> Language
> > >> > > Tasks
> > >> > >
> > >> > > |   |
> > >> > > |   |   |   |   |   |
> > >> > > | A Linear Programming Formulation for Global Inference in Natural
> > >> > > Language Tasks  Last abstract |Contents |Next abstract A Linear
> > >> > Programming
> > >> > > Formulation for Global Inference in Natural Language Tasks  |
> > >> > > |  |
> > >> > > | View on www.cnts.ua.ac.be | Preview by Yahoo |
> > >> > > |  |
> > >> > > |   |
> > >> > >
> > >> > >
> > >> > >
> > >> > > Edi
> > >> > >       From: Cristian Petroaca <cristian.petro...@gmail.com>
> > >> > >  To: dev@stanbol.apache.org
> > >> > >  Sent: Sunday, February 15, 2015 6:34 AM
> > >> > >  Subject: Event Extraction Engine
> > >> > >
> > >> > > Hi All,
> > >> > >
> > >> > > Quite a while ago I started a discussion on this list about Event
> > >> > > Extraction from text. See
> > >> > > https://issues.apache.org/jira/browse/STANBOL-1121
> > >> > > .
> > >> > >
> > >> > > I'd like to get started on the actual work and I have been
> thinking
> > >> how
> > >> > to
> > >> > > best approach this and there are some things that I would do
> > >> differently
> > >> > > than what the JIRA describes.I'd like to get your feedback on it.
> > >> > >
> > >> > > Basically the main approach would be:
> > >> > >
> > >> > > 1. Detect all NERs and their co-references.
> > >> > >
> > >> > > 2. Apply semantic role labeling on the sentences where the above
> > >> > mentioned
> > >> > > NERs reside.
> > >> > > I found some interesting Semantic Role labeling libraries such as
> > >> > > https://code.google.com/p/mate-tools/ or
> > >> > > http://cogcomp.cs.illinois.edu/page/software_view/SRL.
> > >> > > With this I'll be able to detect the Agent, the Verb (action) and
> > the
> > >> > > Patient and Instruments.
> > >> > >
> > >> > > This could be a minimal implementation of the engine. After that I
> > can
> > >> > > simply create the event data model as described in the JIRA and
> > >> annotate
> > >> > > the text.
> > >> > > But this does not actually detect what kind of event it is or what
> > are
> > >> > the
> > >> > > event specific roles that the entities have in the relation.
> > >> > >
> > >> > > For example we can have the sentence "Google buys Yahoo for $100
> > >> > million".
> > >> > > There are a lot more to be said about this sentence than simply
> that
> > >> > > "Google" is the agent and "Yahoo" is the Patient. This is actually
> > an
> > >> > > acquisition event and "Google" is the buyer and "Yahoo" the bought
> > >> > entity.
> > >> > > We also would need to align to a common ontology synonym phrases
> > such
> > >> as
> > >> > > "buy" or "acquire" so that we know that both refer to the same
> > >> > Acquisition
> > >> > > event.
> > >> > >
> > >> > > Having said that, we would add a new step :
> > >> > > 3. Try to detect event type and event details.
> > >> > >
> > >> > > This can be done by either:
> > >> > >
> > >> > > 3.1 Rule based : hand written rules which would map a certain
> > sentence
> > >> > > structure, such as the name of the verb and the type of entities
> as
> > >> > agent,
> > >> > > patient to a certain event type.
> > >> > > This has the benefit of being easy to build but quite inflexible.
> > >> > >
> > >> > > 3.2 Statistical based: train a model which would be able to
> classify
> > >> an
> > >> > > event type based on the features of the sentence such as verb
> type,
> > >> > entity
> > >> > > type, role type, etc.. This is the approach described here :
> > >> > > http://web.stanford.edu/~jurafsky/mintz.pdf.
> > >> > > This would be quite hard to build but quite flexible.
> > >> > >
> > >> > > This 3rd step of detecting event types & details I think would be
> > most
> > >> > > efficient for domain specific events. We would have configs with
> > >> several
> > >> > > models for several domains available and the user could with use
> one
> > >> of
> > >> > the
> > >> > > pre-existent models or create a new one.
> > >> > >
> > >> > > I don't have any practical experience with training models or text
> > >> > > classification based on features (but I've been doing a lot of
> > >> reading on
> > >> > > it) so I'm not sure exactly how feasible what I said at point no 3
> > >> > actually
> > >> > > is.
> > >> > >
> > >> > > Regards,
> > >> > > Cristian
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Event Extraction Engine

Reply via email to