Re: Event Extraction Engine

Dileepa Jayakody Wed, 18 Nov 2015 21:00:07 -0800

Hi Cristian,

Great stuff!
I will look into Stanford NLP project to see how we can do that.


Regards,
Dileepa

On Thu, Nov 19, 2015 at 2:06 AM, Cristian Petroaca <
cristian.petro...@gmail.com> wrote:

> I created a git repository which contains the event extraction engine here
> https://github.com/cpetroaca/stanbol-event-extraction-engine. I've started
> working on an event rule schema that will also incorporate a generic
> ontology definition schema so that one can say that #Person=
> http://dbpedia.org/Person and then use #Person in the rules. I think that
> because Stanbol has access to a dbpedia or yago index will be of great
> value when we want to define events with specific object classes.
>
> Dileepa, if you still want to get involved, you can take a look at the
> Stanbol Stanford NLP project here
> https://github.com/westei/stanbol-stanfordnlp and figure out how to add
> Collapsed Dependencies(
> http://nlp.stanford.edu/software/dependencies_manual.pdf)  to it. We'll
> need them to sort out the subject, verb and objects.
>
> Thanks,
> Cristian
>
> On Mon, Oct 12, 2015 at 3:31 PM, Cristian Petroaca <
> cristian.petro...@gmail.com> wrote:
>
> > Can we get a separate branch where we can start developing the Event
> > Extraction engine?
> >
> > Thanks
> >
> > On Sun, Sep 20, 2015 at 4:26 PM, Cristian Petroaca <
> > cristian.petro...@gmail.com> wrote:
> >
> >> Sorry, hit sent before finishing the mail :).
> >>
> >> So, you will disambiguate it using wordnet like this :
> >>
> >>
> http://wordnetweb.princeton.edu/perl/webwn?s=attack&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=000000
> >>
> >> And then you would have a rule file which would contain something like :
> >> event name= "attack"
> >> event trigger= wordnet class of type = wordnet id && pos=verb
> >> agent=dependency_type:nsubj&&entity_type=Person||Location
> >> patient=dependency_type:dobj&&entity_type=Person||Location
> >>
> >> The dependecy type points to the Stanford NLP dependency tree relation
> >> types described here:
> >> http://nlp.stanford.edu/software/stanford-dependencies.shtml
> >> The entity_type points to either the NER class or the wordnet class for
> >> the noun in the noun phrase.
> >>
> >> This approach was inspired by this paper :
> >> http://www.surdeanu.info/mihai/papers/acl2015.pdf with the difference
> >> that I'm using WSD to disambiguate the event trigger.
> >>
> >> I'll start doing some experiments with this approach.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Sun, Sep 20, 2015 at 4:14 PM, Cristian Petroaca <
> >> cristian.petro...@gmail.com> wrote:
> >>
> >>> Hi Dileepa,
> >>>
> >>> I've been thinking more about the approach using a Word Sense
> >>> Disambiguation tool to classify the verb in the sentence and I think
> it may
> >>> be a good approach. The verb seems to be the event trigger and once you
> >>> know its actual meaning (by applying a Wordnet class or some other DB
> used
> >>> for WSD) then I think it's quite straightforward to identify the
> actors in
> >>> the event (agent, patient, instrument, etc) by applying some user
> defined
> >>> rules for that verb class.
> >>>
> >>> For example if you have the verb "attack" which can have multiple
> >>> meanings depending on the context you will disambiguate it using
> wordnet
> >>> like this:
> >>>
> >>> On Wed, Sep 9, 2015 at 8:33 PM, Dileepa Jayakody <
> >>> dileepajayak...@gmail.com> wrote:
> >>>
> >>>> Hi Cristian,
> >>>>
> >>>> Interesting ideas. Let me do some background reading on this, so I can
> >>>> also
> >>>> participate in the discussion better.
> >>>>
> >>>> Thanks,
> >>>> Dileepa
> >>>>
> >>>> On Wed, Sep 9, 2015 at 3:17 PM, Cristian Petroaca <
> >>>> cristian.petro...@gmail.com> wrote:
> >>>>
> >>>> > Another approach to this would be to use a semantic role labeling
> >>>> tool [1]
> >>>> > to determine the type of relation between the subject and object.
> >>>> >
> >>>> > Or we could use Word Sense Disambiguation to determine the wordnet
> >>>> class of
> >>>> > the verb (this way we have a standard relation definition) and based
> >>>> on
> >>>> > what relation type it is we can search for the subject and object
> >>>> using
> >>>> > dependency tree parsing in Stanford NLP.
> >>>> >
> >>>> > These 2 options ensure that we can have a much bigger recall but I'm
> >>>> not
> >>>> > sure about the precision...
> >>>> >
> >>>> > So I think we'll need to first settle on the method of implementing
> >>>> this
> >>>> > engine before starting anything.
> >>>> >
> >>>> > [1] http://cogcomp.cs.illinois.edu/page/demo_view/srl
> >>>> >
> >>>> > On Tue, Sep 8, 2015 at 11:45 AM, Cristian Petroaca <
> >>>> > cristian.petro...@gmail.com> wrote:
> >>>> >
> >>>> > > Hi Dileepa,
> >>>> > >
> >>>> > > Unfortunately I did not have the time to work on this at all so
> >>>> there is
> >>>> > > no code base . But I'd be happy to start contributing with
> >>>> something to
> >>>> > > this engine and I think it would also be very helpful if you will
> >>>> be able
> >>>> > > to contribute to this as well.
> >>>> > > I did get a chance to test the Stanford relation extractor which
> >>>> works
> >>>> > > fine but it's quite limited to a handful of relation types
> (live_in,
> >>>> > > located_in, org_based_in, work_for). So we would need to train
> other
> >>>> > models
> >>>> > > if we want to increase the relation type number.
> >>>> > > I also think that the Event Extraction Engine should work in
> >>>> conjunction
> >>>> > > with any coreference and comention engines we have to increase the
> >>>> > relation
> >>>> > > count.
> >>>> > >
> >>>> > > Regards,
> >>>> > > Cristian
> >>>> > >
> >>>> > > On Tue, Sep 8, 2015 at 11:19 AM, Dileepa Jayakody <
> >>>> > > dileepajayak...@gmail.com> wrote:
> >>>> > >
> >>>> > >> Hi Cristian and all,
> >>>> > >>
> >>>> > >> Can I please know the status of this event extraction engine?
> Event
> >>>> > >> extraction is a really useful feature for semantic enhancements
> >>>> and I am
> >>>> > >> interested in collaborating with this work.
> >>>> > >> Is there any code base you are currently working on for this
> engine
> >>>> > work?
> >>>> > >>
> >>>> > >> Thanks,
> >>>> > >> Dileepa
> >>>> > >>
> >>>> > >> On Tue, Feb 17, 2015 at 9:10 PM, Cristian Petroaca <
> >>>> > >> cristian.petro...@gmail.com> wrote:
> >>>> > >>
> >>>> > >> > Hi Edi,
> >>>> > >> >
> >>>> > >> > Thanks for the info. Stanford Relation Extractor sounds very
> >>>> > >> interesting.
> >>>> > >> > I'll give it a try.
> >>>> > >> >
> >>>> > >> > 2015-02-17 17:00 GMT+02:00 Edi Bice <edi_b...@yahoo.com.invalid
> >>>> >:
> >>>> > >> >
> >>>> > >> > > Hi Cristian,
> >>>> > >> > > Here are a few more resources on Semantic Role/Relationship
> >>>> > Labeling:
> >>>> > >> > > 1. FrameNet, VerbNet and WordNet on the data side2.
> >>>> Shalmaneser,
> >>>> > >> SEMAFOR
> >>>> > >> > > and Stanford Relation Extractor on the code side
> >>>> > >> > > The last one links to a great paper which I believe holds
> great
> >>>> > >> potential
> >>>> > >> > > for Stanbol:
> >>>> > >> > > A Linear Programming Formulation for Global Inference in
> >>>> Natural
> >>>> > >> Language
> >>>> > >> > > Tasks
> >>>> > >> > >
> >>>> > >> > > |   |
> >>>> > >> > > |   |   |   |   |   |
> >>>> > >> > > | A Linear Programming Formulation for Global Inference in
> >>>> Natural
> >>>> > >> > > Language Tasks  Last abstract |Contents |Next abstract A
> Linear
> >>>> > >> > Programming
> >>>> > >> > > Formulation for Global Inference in Natural Language Tasks  |
> >>>> > >> > > |  |
> >>>> > >> > > | View on www.cnts.ua.ac.be | Preview by Yahoo |
> >>>> > >> > > |  |
> >>>> > >> > > |   |
> >>>> > >> > >
> >>>> > >> > >
> >>>> > >> > >
> >>>> > >> > > Edi
> >>>> > >> > >       From: Cristian Petroaca <cristian.petro...@gmail.com>
> >>>> > >> > >  To: dev@stanbol.apache.org
> >>>> > >> > >  Sent: Sunday, February 15, 2015 6:34 AM
> >>>> > >> > >  Subject: Event Extraction Engine
> >>>> > >> > >
> >>>> > >> > > Hi All,
> >>>> > >> > >
> >>>> > >> > > Quite a while ago I started a discussion on this list about
> >>>> Event
> >>>> > >> > > Extraction from text. See
> >>>> > >> > > https://issues.apache.org/jira/browse/STANBOL-1121
> >>>> > >> > > .
> >>>> > >> > >
> >>>> > >> > > I'd like to get started on the actual work and I have been
> >>>> thinking
> >>>> > >> how
> >>>> > >> > to
> >>>> > >> > > best approach this and there are some things that I would do
> >>>> > >> differently
> >>>> > >> > > than what the JIRA describes.I'd like to get your feedback on
> >>>> it.
> >>>> > >> > >
> >>>> > >> > > Basically the main approach would be:
> >>>> > >> > >
> >>>> > >> > > 1. Detect all NERs and their co-references.
> >>>> > >> > >
> >>>> > >> > > 2. Apply semantic role labeling on the sentences where the
> >>>> above
> >>>> > >> > mentioned
> >>>> > >> > > NERs reside.
> >>>> > >> > > I found some interesting Semantic Role labeling libraries
> such
> >>>> as
> >>>> > >> > > https://code.google.com/p/mate-tools/ or
> >>>> > >> > > http://cogcomp.cs.illinois.edu/page/software_view/SRL.
> >>>> > >> > > With this I'll be able to detect the Agent, the Verb (action)
> >>>> and
> >>>> > the
> >>>> > >> > > Patient and Instruments.
> >>>> > >> > >
> >>>> > >> > > This could be a minimal implementation of the engine. After
> >>>> that I
> >>>> > can
> >>>> > >> > > simply create the event data model as described in the JIRA
> and
> >>>> > >> annotate
> >>>> > >> > > the text.
> >>>> > >> > > But this does not actually detect what kind of event it is or
> >>>> what
> >>>> > are
> >>>> > >> > the
> >>>> > >> > > event specific roles that the entities have in the relation.
> >>>> > >> > >
> >>>> > >> > > For example we can have the sentence "Google buys Yahoo for
> >>>> $100
> >>>> > >> > million".
> >>>> > >> > > There are a lot more to be said about this sentence than
> >>>> simply that
> >>>> > >> > > "Google" is the agent and "Yahoo" is the Patient. This is
> >>>> actually
> >>>> > an
> >>>> > >> > > acquisition event and "Google" is the buyer and "Yahoo" the
> >>>> bought
> >>>> > >> > entity.
> >>>> > >> > > We also would need to align to a common ontology synonym
> >>>> phrases
> >>>> > such
> >>>> > >> as
> >>>> > >> > > "buy" or "acquire" so that we know that both refer to the
> same
> >>>> > >> > Acquisition
> >>>> > >> > > event.
> >>>> > >> > >
> >>>> > >> > > Having said that, we would add a new step :
> >>>> > >> > > 3. Try to detect event type and event details.
> >>>> > >> > >
> >>>> > >> > > This can be done by either:
> >>>> > >> > >
> >>>> > >> > > 3.1 Rule based : hand written rules which would map a certain
> >>>> > sentence
> >>>> > >> > > structure, such as the name of the verb and the type of
> >>>> entities as
> >>>> > >> > agent,
> >>>> > >> > > patient to a certain event type.
> >>>> > >> > > This has the benefit of being easy to build but quite
> >>>> inflexible.
> >>>> > >> > >
> >>>> > >> > > 3.2 Statistical based: train a model which would be able to
> >>>> classify
> >>>> > >> an
> >>>> > >> > > event type based on the features of the sentence such as verb
> >>>> type,
> >>>> > >> > entity
> >>>> > >> > > type, role type, etc.. This is the approach described here :
> >>>> > >> > > http://web.stanford.edu/~jurafsky/mintz.pdf.
> >>>> > >> > > This would be quite hard to build but quite flexible.
> >>>> > >> > >
> >>>> > >> > > This 3rd step of detecting event types & details I think
> would
> >>>> be
> >>>> > most
> >>>> > >> > > efficient for domain specific events. We would have configs
> >>>> with
> >>>> > >> several
> >>>> > >> > > models for several domains available and the user could with
> >>>> use one
> >>>> > >> of
> >>>> > >> > the
> >>>> > >> > > pre-existent models or create a new one.
> >>>> > >> > >
> >>>> > >> > > I don't have any practical experience with training models or
> >>>> text
> >>>> > >> > > classification based on features (but I've been doing a lot
> of
> >>>> > >> reading on
> >>>> > >> > > it) so I'm not sure exactly how feasible what I said at point
> >>>> no 3
> >>>> > >> > actually
> >>>> > >> > > is.
> >>>> > >> > >
> >>>> > >> > > Regards,
> >>>> > >> > > Cristian
> >>>> > >> > >
> >>>> > >> > >
> >>>> > >> > >
> >>>> > >> > >
> >>>> > >> >
> >>>> > >>
> >>>> > >
> >>>> > >
> >>>> >
> >>>>
> >>>
> >>>
> >>
> >
>

Re: Event Extraction Engine

Reply via email to