Hi Cristian, Great stuff! I will look into Stanford NLP project to see how we can do that.
Regards, Dileepa On Thu, Nov 19, 2015 at 2:06 AM, Cristian Petroaca < cristian.petro...@gmail.com> wrote: > I created a git repository which contains the event extraction engine here > https://github.com/cpetroaca/stanbol-event-extraction-engine. I've started > working on an event rule schema that will also incorporate a generic > ontology definition schema so that one can say that #Person= > http://dbpedia.org/Person and then use #Person in the rules. I think that > because Stanbol has access to a dbpedia or yago index will be of great > value when we want to define events with specific object classes. > > Dileepa, if you still want to get involved, you can take a look at the > Stanbol Stanford NLP project here > https://github.com/westei/stanbol-stanfordnlp and figure out how to add > Collapsed Dependencies( > http://nlp.stanford.edu/software/dependencies_manual.pdf) to it. We'll > need them to sort out the subject, verb and objects. > > Thanks, > Cristian > > On Mon, Oct 12, 2015 at 3:31 PM, Cristian Petroaca < > cristian.petro...@gmail.com> wrote: > > > Can we get a separate branch where we can start developing the Event > > Extraction engine? > > > > Thanks > > > > On Sun, Sep 20, 2015 at 4:26 PM, Cristian Petroaca < > > cristian.petro...@gmail.com> wrote: > > > >> Sorry, hit sent before finishing the mail :). > >> > >> So, you will disambiguate it using wordnet like this : > >> > >> > http://wordnetweb.princeton.edu/perl/webwn?s=attack&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=000000 > >> > >> And then you would have a rule file which would contain something like : > >> event name= "attack" > >> event trigger= wordnet class of type = wordnet id && pos=verb > >> agent=dependency_type:nsubj&&entity_type=Person||Location > >> patient=dependency_type:dobj&&entity_type=Person||Location > >> > >> The dependecy type points to the Stanford NLP dependency tree relation > >> types described here: > >> http://nlp.stanford.edu/software/stanford-dependencies.shtml > >> The entity_type points to either the NER class or the wordnet class for > >> the noun in the noun phrase. > >> > >> This approach was inspired by this paper : > >> http://www.surdeanu.info/mihai/papers/acl2015.pdf with the difference > >> that I'm using WSD to disambiguate the event trigger. > >> > >> I'll start doing some experiments with this approach. > >> > >> > >> > >> > >> > >> > >> > >> > >> On Sun, Sep 20, 2015 at 4:14 PM, Cristian Petroaca < > >> cristian.petro...@gmail.com> wrote: > >> > >>> Hi Dileepa, > >>> > >>> I've been thinking more about the approach using a Word Sense > >>> Disambiguation tool to classify the verb in the sentence and I think > it may > >>> be a good approach. The verb seems to be the event trigger and once you > >>> know its actual meaning (by applying a Wordnet class or some other DB > used > >>> for WSD) then I think it's quite straightforward to identify the > actors in > >>> the event (agent, patient, instrument, etc) by applying some user > defined > >>> rules for that verb class. > >>> > >>> For example if you have the verb "attack" which can have multiple > >>> meanings depending on the context you will disambiguate it using > wordnet > >>> like this: > >>> > >>> On Wed, Sep 9, 2015 at 8:33 PM, Dileepa Jayakody < > >>> dileepajayak...@gmail.com> wrote: > >>> > >>>> Hi Cristian, > >>>> > >>>> Interesting ideas. Let me do some background reading on this, so I can > >>>> also > >>>> participate in the discussion better. > >>>> > >>>> Thanks, > >>>> Dileepa > >>>> > >>>> On Wed, Sep 9, 2015 at 3:17 PM, Cristian Petroaca < > >>>> cristian.petro...@gmail.com> wrote: > >>>> > >>>> > Another approach to this would be to use a semantic role labeling > >>>> tool [1] > >>>> > to determine the type of relation between the subject and object. > >>>> > > >>>> > Or we could use Word Sense Disambiguation to determine the wordnet > >>>> class of > >>>> > the verb (this way we have a standard relation definition) and based > >>>> on > >>>> > what relation type it is we can search for the subject and object > >>>> using > >>>> > dependency tree parsing in Stanford NLP. > >>>> > > >>>> > These 2 options ensure that we can have a much bigger recall but I'm > >>>> not > >>>> > sure about the precision... > >>>> > > >>>> > So I think we'll need to first settle on the method of implementing > >>>> this > >>>> > engine before starting anything. > >>>> > > >>>> > [1] http://cogcomp.cs.illinois.edu/page/demo_view/srl > >>>> > > >>>> > On Tue, Sep 8, 2015 at 11:45 AM, Cristian Petroaca < > >>>> > cristian.petro...@gmail.com> wrote: > >>>> > > >>>> > > Hi Dileepa, > >>>> > > > >>>> > > Unfortunately I did not have the time to work on this at all so > >>>> there is > >>>> > > no code base . But I'd be happy to start contributing with > >>>> something to > >>>> > > this engine and I think it would also be very helpful if you will > >>>> be able > >>>> > > to contribute to this as well. > >>>> > > I did get a chance to test the Stanford relation extractor which > >>>> works > >>>> > > fine but it's quite limited to a handful of relation types > (live_in, > >>>> > > located_in, org_based_in, work_for). So we would need to train > other > >>>> > models > >>>> > > if we want to increase the relation type number. > >>>> > > I also think that the Event Extraction Engine should work in > >>>> conjunction > >>>> > > with any coreference and comention engines we have to increase the > >>>> > relation > >>>> > > count. > >>>> > > > >>>> > > Regards, > >>>> > > Cristian > >>>> > > > >>>> > > On Tue, Sep 8, 2015 at 11:19 AM, Dileepa Jayakody < > >>>> > > dileepajayak...@gmail.com> wrote: > >>>> > > > >>>> > >> Hi Cristian and all, > >>>> > >> > >>>> > >> Can I please know the status of this event extraction engine? > Event > >>>> > >> extraction is a really useful feature for semantic enhancements > >>>> and I am > >>>> > >> interested in collaborating with this work. > >>>> > >> Is there any code base you are currently working on for this > engine > >>>> > work? > >>>> > >> > >>>> > >> Thanks, > >>>> > >> Dileepa > >>>> > >> > >>>> > >> On Tue, Feb 17, 2015 at 9:10 PM, Cristian Petroaca < > >>>> > >> cristian.petro...@gmail.com> wrote: > >>>> > >> > >>>> > >> > Hi Edi, > >>>> > >> > > >>>> > >> > Thanks for the info. Stanford Relation Extractor sounds very > >>>> > >> interesting. > >>>> > >> > I'll give it a try. > >>>> > >> > > >>>> > >> > 2015-02-17 17:00 GMT+02:00 Edi Bice <edi_b...@yahoo.com.invalid > >>>> >: > >>>> > >> > > >>>> > >> > > Hi Cristian, > >>>> > >> > > Here are a few more resources on Semantic Role/Relationship > >>>> > Labeling: > >>>> > >> > > 1. FrameNet, VerbNet and WordNet on the data side2. > >>>> Shalmaneser, > >>>> > >> SEMAFOR > >>>> > >> > > and Stanford Relation Extractor on the code side > >>>> > >> > > The last one links to a great paper which I believe holds > great > >>>> > >> potential > >>>> > >> > > for Stanbol: > >>>> > >> > > A Linear Programming Formulation for Global Inference in > >>>> Natural > >>>> > >> Language > >>>> > >> > > Tasks > >>>> > >> > > > >>>> > >> > > | | > >>>> > >> > > | | | | | | > >>>> > >> > > | A Linear Programming Formulation for Global Inference in > >>>> Natural > >>>> > >> > > Language Tasks Last abstract |Contents |Next abstract A > Linear > >>>> > >> > Programming > >>>> > >> > > Formulation for Global Inference in Natural Language Tasks | > >>>> > >> > > | | > >>>> > >> > > | View on www.cnts.ua.ac.be | Preview by Yahoo | > >>>> > >> > > | | > >>>> > >> > > | | > >>>> > >> > > > >>>> > >> > > > >>>> > >> > > > >>>> > >> > > Edi > >>>> > >> > > From: Cristian Petroaca <cristian.petro...@gmail.com> > >>>> > >> > > To: dev@stanbol.apache.org > >>>> > >> > > Sent: Sunday, February 15, 2015 6:34 AM > >>>> > >> > > Subject: Event Extraction Engine > >>>> > >> > > > >>>> > >> > > Hi All, > >>>> > >> > > > >>>> > >> > > Quite a while ago I started a discussion on this list about > >>>> Event > >>>> > >> > > Extraction from text. See > >>>> > >> > > https://issues.apache.org/jira/browse/STANBOL-1121 > >>>> > >> > > . > >>>> > >> > > > >>>> > >> > > I'd like to get started on the actual work and I have been > >>>> thinking > >>>> > >> how > >>>> > >> > to > >>>> > >> > > best approach this and there are some things that I would do > >>>> > >> differently > >>>> > >> > > than what the JIRA describes.I'd like to get your feedback on > >>>> it. > >>>> > >> > > > >>>> > >> > > Basically the main approach would be: > >>>> > >> > > > >>>> > >> > > 1. Detect all NERs and their co-references. > >>>> > >> > > > >>>> > >> > > 2. Apply semantic role labeling on the sentences where the > >>>> above > >>>> > >> > mentioned > >>>> > >> > > NERs reside. > >>>> > >> > > I found some interesting Semantic Role labeling libraries > such > >>>> as > >>>> > >> > > https://code.google.com/p/mate-tools/ or > >>>> > >> > > http://cogcomp.cs.illinois.edu/page/software_view/SRL. > >>>> > >> > > With this I'll be able to detect the Agent, the Verb (action) > >>>> and > >>>> > the > >>>> > >> > > Patient and Instruments. > >>>> > >> > > > >>>> > >> > > This could be a minimal implementation of the engine. After > >>>> that I > >>>> > can > >>>> > >> > > simply create the event data model as described in the JIRA > and > >>>> > >> annotate > >>>> > >> > > the text. > >>>> > >> > > But this does not actually detect what kind of event it is or > >>>> what > >>>> > are > >>>> > >> > the > >>>> > >> > > event specific roles that the entities have in the relation. > >>>> > >> > > > >>>> > >> > > For example we can have the sentence "Google buys Yahoo for > >>>> $100 > >>>> > >> > million". > >>>> > >> > > There are a lot more to be said about this sentence than > >>>> simply that > >>>> > >> > > "Google" is the agent and "Yahoo" is the Patient. This is > >>>> actually > >>>> > an > >>>> > >> > > acquisition event and "Google" is the buyer and "Yahoo" the > >>>> bought > >>>> > >> > entity. > >>>> > >> > > We also would need to align to a common ontology synonym > >>>> phrases > >>>> > such > >>>> > >> as > >>>> > >> > > "buy" or "acquire" so that we know that both refer to the > same > >>>> > >> > Acquisition > >>>> > >> > > event. > >>>> > >> > > > >>>> > >> > > Having said that, we would add a new step : > >>>> > >> > > 3. Try to detect event type and event details. > >>>> > >> > > > >>>> > >> > > This can be done by either: > >>>> > >> > > > >>>> > >> > > 3.1 Rule based : hand written rules which would map a certain > >>>> > sentence > >>>> > >> > > structure, such as the name of the verb and the type of > >>>> entities as > >>>> > >> > agent, > >>>> > >> > > patient to a certain event type. > >>>> > >> > > This has the benefit of being easy to build but quite > >>>> inflexible. > >>>> > >> > > > >>>> > >> > > 3.2 Statistical based: train a model which would be able to > >>>> classify > >>>> > >> an > >>>> > >> > > event type based on the features of the sentence such as verb > >>>> type, > >>>> > >> > entity > >>>> > >> > > type, role type, etc.. This is the approach described here : > >>>> > >> > > http://web.stanford.edu/~jurafsky/mintz.pdf. > >>>> > >> > > This would be quite hard to build but quite flexible. > >>>> > >> > > > >>>> > >> > > This 3rd step of detecting event types & details I think > would > >>>> be > >>>> > most > >>>> > >> > > efficient for domain specific events. We would have configs > >>>> with > >>>> > >> several > >>>> > >> > > models for several domains available and the user could with > >>>> use one > >>>> > >> of > >>>> > >> > the > >>>> > >> > > pre-existent models or create a new one. > >>>> > >> > > > >>>> > >> > > I don't have any practical experience with training models or > >>>> text > >>>> > >> > > classification based on features (but I've been doing a lot > of > >>>> > >> reading on > >>>> > >> > > it) so I'm not sure exactly how feasible what I said at point > >>>> no 3 > >>>> > >> > actually > >>>> > >> > > is. > >>>> > >> > > > >>>> > >> > > Regards, > >>>> > >> > > Cristian > >>>> > >> > > > >>>> > >> > > > >>>> > >> > > > >>>> > >> > > > >>>> > >> > > >>>> > >> > >>>> > > > >>>> > > > >>>> > > >>>> > >>> > >>> > >> > > >