Hi Rupert, Ok, so after looking at the JSON output from the Stanford NLP Server and the coref module I'm thinking I can represent the coreference information this way: Each "Token" or "Chunk" will contain an additional coref annotation with the following structure :
"stanbol.enhancer.nlp.coref" { "tag" : //does this need to exist? "isRepresentative" : true/false, // whether this token or chunk is the representative mention in the chain "mentions" : [ { "sentenceNo" : 1 //the sentence in which the mention is found "startWord" : 2 //the first word making up the mention "endWord" : 3 //the last word making up the mention }, ... ], "class" : ""class" : "org.apache.stanbol.enhancer.nlp.coref.CorefTag" } The CorefTag should resemble this model. What do you think? Cristian 2013/8/24 Rupert Westenthaler <rupert.westentha...@gmail.com> > Hi Cristian, > > you can not directly call StanfordNLP components from Stanbol, but you > have to extend the RESTful service to include the information you > need. The main reason for that is that the license of StanfordNLP is > not compatible with the Apache Software License. So Stanbol can not > directly link to the StanfordNLP API. > > You will need to > > 1. define an additional class {yourTag} extends Tag<{yourType}> class > in the o.a.s.enhancer.nlp module > 2. add JSON parsing and serialization support for this tag to the > o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an example) > > As (1) would be necessary anyway the only additional thing you need to > develop is (2). After that you can add {yourTag} instance to the > AnalyzedText in the StanfornNLP integration. The > RestfulNlpAnalysisEngine will parse them from the response. All > engines executed after the RestfulNlpAnalysisEngine will have access > to your annotations. > > If you have a design for {yourTag} - the model you would like to use > to represent your data - I can help with (1) and (2). > > best > Rupert > > > On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca > <cristian.petro...@gmail.com> wrote: > > Hi Rupert, > > > > Thanks for the info. Looking at the standbol-stanfordnlp project I see > that > > the stanford nlp is not implemented as an EnhancementEngine but rather it > > is used directly in a Jetty Server instance. How does that fit into the > > Stanbol stack? For example how can I call the StanfordNlpAnalyzer's > routine > > from my TripleExtractionEnhancementEngine which lives in the Stanbol > stack? > > > > Thanks, > > Cristian > > > > > > 2013/8/12 Rupert Westenthaler <rupert.westentha...@gmail.com> > > > >> Hi Cristian, > >> > >> Sorry for the late response, but I was offline for the last two weeks > >> > >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca > >> <cristian.petro...@gmail.com> wrote: > >> > Hi Rupert, > >> > > >> > After doing some tests it seems that the Stanford NLP coreference > module > >> is > >> > much more accurate than the Open NLP one.So I decided to extend > Stanford > >> > NLP to add coreference there. > >> > >> The Stanford NLP integration is not part of the Stanbol codebase > >> because the licenses are not compatible. > >> > >> You can find the Stanford NLP integration on > >> > >> https://github.com/westei/stanbol-stanfordnlp > >> > >> just create a fork and send pull requests. > >> > >> > >> > Could you add the necessary projects on the branch? And also remove > the > >> > Open NLP ones? > >> > > >> > >> Currently the branch > >> > >> > >> > http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/ > >> > >> only contains the "nlp" and the "nlp-json" modules. IMO those should > >> be enough for adding coreference support. > >> > >> IMO you will need to > >> > >> * add an model for representing coreference to the nlp module > >> * add parsing and serializing support to the nlp-json module > >> * add the implementation to your fork of the stanbol-stanfordnlp project > >> > >> best > >> Rupert > >> > >> > >> > >> > Thanks, > >> > Cristian > >> > > >> > > >> > 2013/7/5 Rupert Westenthaler <rupert.westentha...@gmail.com> > >> > > >> >> Hi Cristian, > >> >> > >> >> I created the branch at > >> >> > >> >> > >> >> > >> > http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/ > >> >> > >> >> ATM in contains only the "nlp" and "nlp-json" module. Let me know if > >> >> you would like to have more > >> >> > >> >> best > >> >> Rupert > >> >> > >> >> > >> >> > >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca > >> >> <cristian.petro...@gmail.com> wrote: > >> >> > Hi Rupert, > >> >> > > >> >> > I created jiras : > https://issues.apache.org/jira/browse/STANBOL-1132and > >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. The original > one > >> in > >> >> > dependent upon these. > >> >> > Please let me know when I can start using the branch. > >> >> > > >> >> > Thanks, > >> >> > Cristian > >> >> > > >> >> > > >> >> > 2013/6/27 Cristian Petroaca <cristian.petro...@gmail.com> > >> >> > > >> >> >> > >> >> >> > >> >> >> > >> >> >> 2013/6/27 Rupert Westenthaler <rupert.westentha...@gmail.com> > >> >> >> > >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca > >> >> >>> <cristian.petro...@gmail.com> wrote: > >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford in my previous > >> >> e-mail. > >> >> >>> By > >> >> >>> > the way, does Open NLP have the ability to build dependency > trees? > >> >> >>> > > >> >> >>> > >> >> >>> AFAIK OpenNLP does not provide this feature. > >> >> >>> > >> >> >> > >> >> >> Then , since the Stanford NLP lib is also integrated into Stanbol, > >> I'll > >> >> >> take a look at how I can extend its integration to include the > >> >> dependency > >> >> >> tree feature. > >> >> >> > >> >> >>> > >> >> >>> > >> >> >> > > >> >> >>> > 2013/6/23 Cristian Petroaca <cristian.petro...@gmail.com> > >> >> >>> > > >> >> >>> >> Hi Rupert, > >> >> >>> >> > >> >> >>> >> I created jira > >> https://issues.apache.org/jira/browse/STANBOL-1121. > >> >> >>> >> As you suggested I would start with extending the Stanford NLP > >> with > >> >> >>> >> co-reference resolution but I think also with dependency trees > >> >> because > >> >> >>> I > >> >> >>> >> also need to know the Subject of the sentence and the object > >> that it > >> >> >>> >> affects, right? > >> >> >>> >> > >> >> >>> >> Given that I need to extend the Stanford NLP API in Stanbol > for > >> >> >>> >> co-reference and dependency trees, how do I proceed with this? > >> Do I > >> >> >>> create > >> >> >>> >> 2 new sub-tasks to the already opened Jira? After that can I > >> start > >> >> >>> >> implementing on my local copy of Stanbol and when I'm done > I'll > >> send > >> >> >>> you > >> >> >>> >> guys the patch fo review? > >> >> >>> >> > >> >> >>> > >> >> >>> I would create two "New Feature" type Issues one for adding > support > >> >> >>> for "dependency trees" and the other for "co-reference" support. > You > >> >> >>> should also define "depends on" relations between STANBOL-1121 > and > >> >> >>> those two new issues. > >> >> >>> > >> >> >>> Sub-task could also work, but as adding those features would be > also > >> >> >>> interesting for other things I would rather define them as > separate > >> >> >>> issues. > >> >> >>> > >> >> >>> > >> >> >> 2 New Features connected with the original jira it is then. > >> >> >> > >> >> >> > >> >> >>> If you would prefer to work in an own branch please tell me. This > >> >> >>> could have the advantage that patches would not be affected by > >> changes > >> >> >>> in the trunk. > >> >> >>> > >> >> >>> Yes, a separate branch sounds good. > >> >> >> > >> >> >> best > >> >> >>> Rupert > >> >> >>> > >> >> >>> >> Regards, > >> >> >>> >> Cristian > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> 2013/6/18 Rupert Westenthaler <rupert.westentha...@gmail.com> > >> >> >>> >> > >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca > >> >> >>> >>> <cristian.petro...@gmail.com> wrote: > >> >> >>> >>> > Hi Rupert, > >> >> >>> >>> > > >> >> >>> >>> > Agreed on the > >> >> >>> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation > >> >> >>> >>> > data structure. > >> >> >>> >>> > > >> >> >>> >>> > Should I open up a Jira for all of this in order to > >> encapsulate > >> >> this > >> >> >>> >>> > information and establish the goals and these initial steps > >> >> towards > >> >> >>> >>> these > >> >> >>> >>> > goals? > >> >> >>> >>> > >> >> >>> >>> Yes please. A JIRA issue for this work would be great. > >> >> >>> >>> > >> >> >>> >>> > How should I proceed further? Should I create some design > >> >> documents > >> >> >>> that > >> >> >>> >>> > need to be reviewed? > >> >> >>> >>> > >> >> >>> >>> Usually it is the best to write design related text directly > in > >> >> JIRA > >> >> >>> >>> by using Markdown [1] syntax. This will allow us later to use > >> this > >> >> >>> >>> text directly for the documentation on the Stanbol Webpage. > >> >> >>> >>> > >> >> >>> >>> best > >> >> >>> >>> Rupert > >> >> >>> >>> > >> >> >>> >>> > >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/ > >> >> >>> >>> > > >> >> >>> >>> > Regards, > >> >> >>> >>> > Cristian > >> >> >>> >>> > > >> >> >>> >>> > > >> >> >>> >>> > 2013/6/17 Rupert Westenthaler < > rupert.westentha...@gmail.com> > >> >> >>> >>> > > >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian Petroaca > >> >> >>> >>> >> <cristian.petro...@gmail.com> wrote: > >> >> >>> >>> >> > HI Rupert, > >> >> >>> >>> >> > > >> >> >>> >>> >> > First of all thanks for the detailed suggestions. > >> >> >>> >>> >> > > >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler < > >> rupert.westentha...@gmail.com> > >> >> >>> >>> >> > > >> >> >>> >>> >> >> Hi Cristian, all > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> really interesting use case! > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> In this mail I will try to give some suggestions on how > >> this > >> >> >>> could > >> >> >>> >>> >> >> work out. This suggestions are mainly based on > experiences > >> >> and > >> >> >>> >>> lessons > >> >> >>> >>> >> >> learned in the LIVE [2] project where we built an > >> information > >> >> >>> system > >> >> >>> >>> >> >> for the Olympic Games in Peking. While this Project > >> excluded > >> >> the > >> >> >>> >>> >> >> extraction of Events from unstructured text (because > the > >> >> Olympic > >> >> >>> >>> >> >> Information System was already providing event data as > XML > >> >> >>> messages) > >> >> >>> >>> >> >> the semantic search capabilities of this system where > very > >> >> >>> similar > >> >> >>> >>> as > >> >> >>> >>> >> >> the one described by your use case. > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> IMHO you are not only trying to extract relations, but > a > >> >> formal > >> >> >>> >>> >> >> representation of the situation described by the text. > So > >> >> lets > >> >> >>> >>> assume > >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or Situation) > >> >> described > >> >> >>> in > >> >> >>> >>> the > >> >> >>> >>> >> >> text - a fise:SettingAnnotation. > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives some advices > on > >> >> how to > >> >> >>> >>> model > >> >> >>> >>> >> >> those. The important relation for modeling this > >> >> Participation: > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t)) > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> where .. > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> * ED are Endurants (continuants): Endurants do have an > >> >> >>> identity so > >> >> >>> >>> we > >> >> >>> >>> >> >> would typically refer to them as Entities referenced > by a > >> >> >>> setting. > >> >> >>> >>> >> >> Note that this includes physical, non-physical as well > as > >> >> >>> >>> >> >> social-objects. > >> >> >>> >>> >> >> * PD are Perdurants (occurrents): Perdurants are > >> entities > >> >> that > >> >> >>> >>> >> >> happen in time. This refers to Events, Activities ... > >> >> >>> >>> >> >> * PC are Participation: It is an time indexed relation > >> where > >> >> >>> >>> >> >> Endurants participate in Perdurants > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> Modeling this in RDF requires to define some > intermediate > >> >> >>> resources > >> >> >>> >>> >> >> because RDF does not allow for n-ary relations. > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> * fise:SettingAnnotation: It is really handy to define > >> one > >> >> >>> resource > >> >> >>> >>> >> >> being the context for all described data. I would call > >> this > >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a > sub-concept to > >> >> >>> >>> >> >> fise:Enhancement. All further enhancement about the > >> extracted > >> >> >>> >>> Setting > >> >> >>> >>> >> >> would define a "fise:in-setting" relation to it. > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> * fise:ParticipantAnnotation: Is used to annotate that > >> >> >>> Endurant is > >> >> >>> >>> >> >> participating on a setting (fise:in-setting > >> >> >>> fise:SettingAnnotation). > >> >> >>> >>> >> >> The Endurant itself is described by existing > >> >> fise:TextAnnotaion > >> >> >>> (the > >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation (suggested > Entities). > >> >> >>> Basically > >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow an > >> >> EnhancementEngine > >> >> >>> to > >> >> >>> >>> >> >> state that several mentions (in possible different > >> >> sentences) do > >> >> >>> >>> >> >> represent the same Endurant as participating in the > >> Setting. > >> >> In > >> >> >>> >>> >> >> addition it would be possible to use the dc:type > property > >> >> >>> (similar > >> >> >>> >>> as > >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the role(s) of an > >> >> >>> participant > >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally performs an action) > >> Cause > >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), Patient (a passive > >> role > >> >> in > >> >> >>> an > >> >> >>> >>> >> >> activity) and Instrument (aids an process)), but I am > >> >> wondering > >> >> >>> if > >> >> >>> >>> one > >> >> >>> >>> >> >> could extract those information. > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to annotate a > >> Perdurant > >> >> in > >> >> >>> the > >> >> >>> >>> >> >> context of the Setting. Also fise:OccurrentAnnotation > can > >> >> link > >> >> >>> to > >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the text > defining > >> the > >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation suggesting > >> well > >> >> >>> known > >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election in a > country, > >> or > >> >> an > >> >> >>> >>> >> >> upraising ...). In addition fise:OccurrentAnnotation > can > >> >> define > >> >> >>> >>> >> >> dc:has-participant links to > fise:ParticipantAnnotation. In > >> >> this > >> >> >>> case > >> >> >>> >>> >> >> it is explicitly stated hat an Endurant (the > >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in this Perturant > >> (the > >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences are temporal > >> >> indexed > >> >> >>> this > >> >> >>> >>> >> >> annotation should also support properties for defining > the > >> >> >>> >>> >> >> xsd:dateTime for the start/end. > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> Indeed, an event based data structure makes a lot of > sense > >> >> with > >> >> >>> the > >> >> >>> >>> >> remark > >> >> >>> >>> >> > that you probably won't be able to always extract the > date > >> >> for a > >> >> >>> >>> given > >> >> >>> >>> >> > setting(situation). > >> >> >>> >>> >> > There are 2 thing which are unclear though. > >> >> >>> >>> >> > > >> >> >>> >>> >> > 1. Perdurant : You could have situations in which the > >> object > >> >> upon > >> >> >>> >>> which > >> >> >>> >>> >> the > >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a transitory > >> object ( > >> >> >>> such > >> >> >>> >>> as an > >> >> >>> >>> >> > event, activity ) but rather another Endurant. For > example > >> we > >> >> can > >> >> >>> >>> have > >> >> >>> >>> >> the > >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the Endurant ( > >> >> Subject ) > >> >> >>> >>> which > >> >> >>> >>> >> > performs the action of "invading" on another Eundurant, > >> namely > >> >> >>> >>> "Irak". > >> >> >>> >>> >> > > >> >> >>> >>> >> > >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq the > Patient. > >> Both > >> >> >>> are > >> >> >>> >>> >> Endurants. The activity "invading" would be the > Perdurant. So > >> >> >>> ideally > >> >> >>> >>> >> you would have a "fise:SettingAnnotation" with: > >> >> >>> >>> >> > >> >> >>> >>> >> * fise:ParticipantAnnotation for USA with the dc:type > >> >> caos:Agent, > >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" and a > >> >> >>> fise:EntityAnnotation > >> >> >>> >>> >> linking to dbpedia:United_States > >> >> >>> >>> >> * fise:ParticipantAnnotation for Iraq with the dc:type > >> >> >>> caos:Patient, > >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" and a > >> >> >>> >>> >> fise:EntityAnnotation linking to dbpedia:Iraq > >> >> >>> >>> >> * fise:OccurrentAnnotation for "invades" with the > dc:type > >> >> >>> >>> >> caos:Activity, linking to a fise:TextAnnotation for > "invades" > >> >> >>> >>> >> > >> >> >>> >>> >> > 2. Where does the verb, which links the Subject and the > >> Object > >> >> >>> come > >> >> >>> >>> into > >> >> >>> >>> >> > this? I imagined that the Endurant would have a > >> dc:"property" > >> >> >>> where > >> >> >>> >>> the > >> >> >>> >>> >> > property = verb which links to the Object in noun form. > For > >> >> >>> example > >> >> >>> >>> take > >> >> >>> >>> >> > again the sentence "USA invades Irak". You would have > the > >> >> "USA" > >> >> >>> >>> Entity > >> >> >>> >>> >> with > >> >> >>> >>> >> > dc:invader which points to the Object "Irak". The > Endurant > >> >> would > >> >> >>> >>> have as > >> >> >>> >>> >> > many dc:"property" elements as there are verbs which > link > >> it > >> >> to > >> >> >>> an > >> >> >>> >>> >> Object. > >> >> >>> >>> >> > >> >> >>> >>> >> As explained above you would have a > fise:OccurrentAnnotation > >> >> that > >> >> >>> >>> >> represents the Perdurant. The information that the > activity > >> >> >>> mention in > >> >> >>> >>> >> the text is "invades" would be by linking to a > >> >> >>> fise:TextAnnotation. If > >> >> >>> >>> >> you can also provide an Ontology for Tasks that defines > >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation could also > link > >> >> to an > >> >> >>> >>> >> fise:EntityAnnotation for this concept. > >> >> >>> >>> >> > >> >> >>> >>> >> best > >> >> >>> >>> >> Rupert > >> >> >>> >>> >> > >> >> >>> >>> >> > > >> >> >>> >>> >> > ### Consuming the data: > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> I think this model should be sufficient for use-cases > as > >> >> >>> described > >> >> >>> >>> by > >> >> >>> >>> >> you. > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> Users would be able to consume data on the setting > level. > >> >> This > >> >> >>> can > >> >> >>> >>> be > >> >> >>> >>> >> >> done my simple retrieving all > fise:ParticipantAnnotation > >> as > >> >> >>> well as > >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a setting. BTW > this > >> was > >> >> the > >> >> >>> >>> >> >> approach used in LIVE [2] for semantic search. It > allows > >> >> >>> queries for > >> >> >>> >>> >> >> Settings that involve specific Entities e.g. you could > >> filter > >> >> >>> for > >> >> >>> >>> >> >> Settings that involve a {Person}, activities:Arrested > and > >> a > >> >> >>> specific > >> >> >>> >>> >> >> {Upraising}. However note that with this approach you > will > >> >> get > >> >> >>> >>> results > >> >> >>> >>> >> >> for Setting where the {Person} participated and an > other > >> >> person > >> >> >>> was > >> >> >>> >>> >> >> arrested. > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> An other possibility would be to process enhancement > >> results > >> >> on > >> >> >>> the > >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow to a much > >> higher > >> >> >>> >>> >> >> granularity level (e.g. it would allow to correctly > answer > >> >> the > >> >> >>> query > >> >> >>> >>> >> >> used as an example above). But I am wondering if the > >> quality > >> >> of > >> >> >>> the > >> >> >>> >>> >> >> Setting extraction will be sufficient for this. I have > >> also > >> >> >>> doubts > >> >> >>> >>> if > >> >> >>> >>> >> >> this can be still realized by using semantic indexing > to > >> >> Apache > >> >> >>> Solr > >> >> >>> >>> >> >> or if it would be better/necessary to store results in > a > >> >> >>> TripleStore > >> >> >>> >>> >> >> and using SPARQL for retrieval. > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> The methodology and query language used by YAGO [3] is > >> also > >> >> very > >> >> >>> >>> >> >> relevant for this (especially note chapter 7 SPOTL(X) > >> >> >>> >>> Representation). > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> An other related Topic is the enrichment of Entities > >> >> (especially > >> >> >>> >>> >> >> Events) in knowledge bases based on Settings extracted > >> form > >> >> >>> >>> Documents. > >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants are temporal > >> >> indexed. > >> >> >>> That > >> >> >>> >>> >> >> means that at the time when added to a knowledge base > they > >> >> might > >> >> >>> >>> still > >> >> >>> >>> >> >> be in process. So the creation, enriching and > refinement > >> of > >> >> such > >> >> >>> >>> >> >> Entities in a the knowledge base seams to be critical > for > >> a > >> >> >>> System > >> >> >>> >>> >> >> like described in your use-case. > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian Petroaca > >> >> >>> >>> >> >> <cristian.petro...@gmail.com> wrote: > >> >> >>> >>> >> >> > > >> >> >>> >>> >> >> > First of all I have to mention that I am new in the > >> field > >> >> of > >> >> >>> >>> semantic > >> >> >>> >>> >> >> > technologies, I've started to read about them in the > >> last > >> >> 4-5 > >> >> >>> >>> >> >> months.Having > >> >> >>> >>> >> >> > said that I have a high level overview of what is a > good > >> >> >>> approach > >> >> >>> >>> to > >> >> >>> >>> >> >> solve > >> >> >>> >>> >> >> > this problem. There are a number of papers on the > >> internet > >> >> >>> which > >> >> >>> >>> >> describe > >> >> >>> >>> >> >> > what steps need to be taken such as : named entity > >> >> >>> recognition, > >> >> >>> >>> >> >> > co-reference resolution, pos tagging and others. > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> The Stanbol NLP processing module currently only > supports > >> >> >>> sentence > >> >> >>> >>> >> >> detection, tokenization, POS tagging, Chunking, NER and > >> >> lemma. > >> >> >>> >>> support > >> >> >>> >>> >> >> for co-reference resolution and dependency trees is > >> currently > >> >> >>> >>> missing. > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> Stanford NLP is already integrated with Stanbol [4]. At > >> the > >> >> >>> moment > >> >> >>> >>> it > >> >> >>> >>> >> >> only supports English, but I do already work to include > >> the > >> >> >>> other > >> >> >>> >>> >> >> supported languages. Other NLP framework that is > already > >> >> >>> integrated > >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane [6]. But > note > >> >> that > >> >> >>> for > >> >> >>> >>> all > >> >> >>> >>> >> >> those the integration excludes support for co-reference > >> and > >> >> >>> >>> dependency > >> >> >>> >>> >> >> trees. > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> Anyways I am confident that one can implement a first > >> >> prototype > >> >> >>> by > >> >> >>> >>> >> >> only using Sentences and POS tags and - if available - > >> Chunks > >> >> >>> (e.g. > >> >> >>> >>> >> >> Noun phrases). > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> > >> >> >>> >>> >> > I assume that in the Stanbol context, a feature like > >> Relation > >> >> >>> >>> extraction > >> >> >>> >>> >> > would be implemented as an EnhancementEngine? > >> >> >>> >>> >> > What kind of effort would be required for a co-reference > >> >> >>> resolution > >> >> >>> >>> tool > >> >> >>> >>> >> > integration into Stanbol? > >> >> >>> >>> >> > > >> >> >>> >>> >> > >> >> >>> >>> >> Yes in the end it would be an EnhancementEngine. But > before > >> we > >> >> can > >> >> >>> >>> >> build such an engine we would need to > >> >> >>> >>> >> > >> >> >>> >>> >> * extend the Stanbol NLP processing API with Annotations > for > >> >> >>> >>> co-reference > >> >> >>> >>> >> * add support for JSON Serialisation/Parsing for those > >> >> annotation > >> >> >>> so > >> >> >>> >>> >> that the RESTful NLP Analysis Service can provide > >> co-reference > >> >> >>> >>> >> information > >> >> >>> >>> >> > >> >> >>> >>> >> > At this moment I'll be focusing on 2 aspects: > >> >> >>> >>> >> > > >> >> >>> >>> >> > 1. Determine the best data structure to encapsulate the > >> >> extracted > >> >> >>> >>> >> > information. I'll take a closer look at Dolce. > >> >> >>> >>> >> > >> >> >>> >>> >> Don't make to to complex. Defining a proper structure to > >> >> represent > >> >> >>> >>> >> Events will only pay-off if we can also successfully > extract > >> >> such > >> >> >>> >>> >> information form processed texts. > >> >> >>> >>> >> > >> >> >>> >>> >> I would start with > >> >> >>> >>> >> > >> >> >>> >>> >> * fise:SettingAnnotation > >> >> >>> >>> >> * {fise:Enhancement} metadata > >> >> >>> >>> >> > >> >> >>> >>> >> * fise:ParticipantAnnotation > >> >> >>> >>> >> * {fise:Enhancement} metadata > >> >> >>> >>> >> * fise:inSetting {settingAnnotation} > >> >> >>> >>> >> * fise:hasMention {textAnnotation} > >> >> >>> >>> >> * fise:suggestion {entityAnnotation} (multiple if > there > >> are > >> >> >>> more > >> >> >>> >>> >> suggestions) > >> >> >>> >>> >> * dc:type one of fise:Agent, fise:Patient, > >> fise:Instrument, > >> >> >>> >>> fise:Cause > >> >> >>> >>> >> > >> >> >>> >>> >> * fise:OccurrentAnnotation > >> >> >>> >>> >> * {fise:Enhancement} metadata > >> >> >>> >>> >> * fise:inSetting {settingAnnotation} > >> >> >>> >>> >> * fise:hasMention {textAnnotation} > >> >> >>> >>> >> * dc:type set to fise:Activity > >> >> >>> >>> >> > >> >> >>> >>> >> If it turns out that we can extract more, we can add more > >> >> >>> structure to > >> >> >>> >>> >> those annotations. We might also think about using an own > >> >> namespace > >> >> >>> >>> >> for those extensions to the annotation structure. > >> >> >>> >>> >> > >> >> >>> >>> >> > 2. Determine how should all of this be integrated into > >> >> Stanbol. > >> >> >>> >>> >> > >> >> >>> >>> >> Just create an EventExtractionEngine and configure a > >> enhancement > >> >> >>> chain > >> >> >>> >>> >> that does NLP processing and EntityLinking. > >> >> >>> >>> >> > >> >> >>> >>> >> You should have a look at > >> >> >>> >>> >> > >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does a lot of > things > >> >> with > >> >> >>> NLP > >> >> >>> >>> >> processing results (e.g. connecting adjectives (via > verbs) to > >> >> >>> >>> >> nouns/pronouns. So as long we can not use explicit > dependency > >> >> trees > >> >> >>> >>> >> you code will need to do similar things with Nouns, > Pronouns > >> and > >> >> >>> >>> >> Verbs. > >> >> >>> >>> >> > >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a Java > >> >> representation > >> >> >>> of > >> >> >>> >>> >> present fise:TextAnnotation and fise:EntityAnnotation [2]. > >> >> >>> Something > >> >> >>> >>> >> similar will also be required by the EventExtractionEngine > >> for > >> >> fast > >> >> >>> >>> >> access to such annotations while iterating over the > >> Sentences of > >> >> >>> the > >> >> >>> >>> >> text. > >> >> >>> >>> >> > >> >> >>> >>> >> > >> >> >>> >>> >> best > >> >> >>> >>> >> Rupert > >> >> >>> >>> >> > >> >> >>> >>> >> [1] > >> >> >>> >>> >> > >> >> >>> >>> > >> >> >>> > >> >> > >> > https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java > >> >> >>> >>> >> [2] > >> >> >>> >>> >> > >> >> >>> >>> > >> >> >>> > >> >> > >> > https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java > >> >> >>> >>> >> > >> >> >>> >>> >> > > >> >> >>> >>> >> > Thanks > >> >> >>> >>> >> > > >> >> >>> >>> >> > Hope this helps to bootstrap this discussion > >> >> >>> >>> >> >> best > >> >> >>> >>> >> >> Rupert > >> >> >>> >>> >> >> > >> >> >>> >>> >> >> -- > >> >> >>> >>> >> >> | Rupert Westenthaler > >> >> rupert.westentha...@gmail.com > >> >> >>> >>> >> >> | Bodenlehenstraße 11 > >> >> >>> ++43-699-11108907 > >> >> >>> >>> >> >> | A-5500 Bischofshofen > >> >> >>> >>> >> >> > >> >> >>> >>> >> > >> >> >>> >>> >> > >> >> >>> >>> >> > >> >> >>> >>> >> -- > >> >> >>> >>> >> | Rupert Westenthaler > >> rupert.westentha...@gmail.com > >> >> >>> >>> >> | Bodenlehenstraße 11 > >> >> >>> ++43-699-11108907 > >> >> >>> >>> >> | A-5500 Bischofshofen > >> >> >>> >>> >> > >> >> >>> >>> > >> >> >>> >>> > >> >> >>> >>> > >> >> >>> >>> -- > >> >> >>> >>> | Rupert Westenthaler > rupert.westentha...@gmail.com > >> >> >>> >>> | Bodenlehenstraße 11 > >> >> ++43-699-11108907 > >> >> >>> >>> | A-5500 Bischofshofen > >> >> >>> >>> > >> >> >>> >> > >> >> >>> >> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> -- > >> >> >>> | Rupert Westenthaler rupert.westentha...@gmail.com > >> >> >>> | Bodenlehenstraße 11 > ++43-699-11108907 > >> >> >>> | A-5500 Bischofshofen > >> >> >>> > >> >> >> > >> >> >> > >> >> > >> >> > >> >> > >> >> -- > >> >> | Rupert Westenthaler rupert.westentha...@gmail.com > >> >> | Bodenlehenstraße 11 ++43-699-11108907 > >> >> | A-5500 Bischofshofen > >> >> > >> > >> > >> > >> -- > >> | Rupert Westenthaler rupert.westentha...@gmail.com > >> | Bodenlehenstraße 11 ++43-699-11108907 > >> | A-5500 Bischofshofen > >> > > > > -- > | Rupert Westenthaler rupert.westentha...@gmail.com > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen >