Hi Cristian and all, Can I please know the status of this event extraction engine? Event extraction is a really useful feature for semantic enhancements and I am interested in collaborating with this work. Is there any code base you are currently working on for this engine work?
Thanks, Dileepa On Tue, Feb 17, 2015 at 9:10 PM, Cristian Petroaca < cristian.petro...@gmail.com> wrote: > Hi Edi, > > Thanks for the info. Stanford Relation Extractor sounds very interesting. > I'll give it a try. > > 2015-02-17 17:00 GMT+02:00 Edi Bice <edi_b...@yahoo.com.invalid>: > > > Hi Cristian, > > Here are a few more resources on Semantic Role/Relationship Labeling: > > 1. FrameNet, VerbNet and WordNet on the data side2. Shalmaneser, SEMAFOR > > and Stanford Relation Extractor on the code side > > The last one links to a great paper which I believe holds great potential > > for Stanbol: > > A Linear Programming Formulation for Global Inference in Natural Language > > Tasks > > > > | | > > | | | | | | > > | A Linear Programming Formulation for Global Inference in Natural > > Language Tasks Last abstract |Contents |Next abstract A Linear > Programming > > Formulation for Global Inference in Natural Language Tasks | > > | | > > | View on www.cnts.ua.ac.be | Preview by Yahoo | > > | | > > | | > > > > > > > > Edi > > From: Cristian Petroaca <cristian.petro...@gmail.com> > > To: dev@stanbol.apache.org > > Sent: Sunday, February 15, 2015 6:34 AM > > Subject: Event Extraction Engine > > > > Hi All, > > > > Quite a while ago I started a discussion on this list about Event > > Extraction from text. See > > https://issues.apache.org/jira/browse/STANBOL-1121 > > . > > > > I'd like to get started on the actual work and I have been thinking how > to > > best approach this and there are some things that I would do differently > > than what the JIRA describes.I'd like to get your feedback on it. > > > > Basically the main approach would be: > > > > 1. Detect all NERs and their co-references. > > > > 2. Apply semantic role labeling on the sentences where the above > mentioned > > NERs reside. > > I found some interesting Semantic Role labeling libraries such as > > https://code.google.com/p/mate-tools/ or > > http://cogcomp.cs.illinois.edu/page/software_view/SRL. > > With this I'll be able to detect the Agent, the Verb (action) and the > > Patient and Instruments. > > > > This could be a minimal implementation of the engine. After that I can > > simply create the event data model as described in the JIRA and annotate > > the text. > > But this does not actually detect what kind of event it is or what are > the > > event specific roles that the entities have in the relation. > > > > For example we can have the sentence "Google buys Yahoo for $100 > million". > > There are a lot more to be said about this sentence than simply that > > "Google" is the agent and "Yahoo" is the Patient. This is actually an > > acquisition event and "Google" is the buyer and "Yahoo" the bought > entity. > > We also would need to align to a common ontology synonym phrases such as > > "buy" or "acquire" so that we know that both refer to the same > Acquisition > > event. > > > > Having said that, we would add a new step : > > 3. Try to detect event type and event details. > > > > This can be done by either: > > > > 3.1 Rule based : hand written rules which would map a certain sentence > > structure, such as the name of the verb and the type of entities as > agent, > > patient to a certain event type. > > This has the benefit of being easy to build but quite inflexible. > > > > 3.2 Statistical based: train a model which would be able to classify an > > event type based on the features of the sentence such as verb type, > entity > > type, role type, etc.. This is the approach described here : > > http://web.stanford.edu/~jurafsky/mintz.pdf. > > This would be quite hard to build but quite flexible. > > > > This 3rd step of detecting event types & details I think would be most > > efficient for domain specific events. We would have configs with several > > models for several domains available and the user could with use one of > the > > pre-existent models or create a new one. > > > > I don't have any practical experience with training models or text > > classification based on features (but I've been doing a lot of reading on > > it) so I'm not sure exactly how feasible what I said at point no 3 > actually > > is. > > > > Regards, > > Cristian > > > > > > > > >