Sorry, pressed sent too soon :). Continued :
nsubj(met-4, Mary-1), conj_and(Mary-1, Tom-3), nsubj(met-4, Tom-3), root(ROOT-0, met-4), nn(today-6, Danny-5), tmod(met-4, today-6)] Given this, we can have for each "Token" an additional dependency annotation : "stanbol.enhancer.nlp.dependency" : { "tag" : //is it necessary? "relations" : [ { "type" : "nsubj", //type of relation "role" : "gov/dep", //whether it is depender or the dependee "dependencyValue" : "met", // the word with which the token has a relation "dependencyIndexInSentence" : "2" //the index of the dependency in the current sentence } ... ] "class" : "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag" } 2013/9/1 Cristian Petroaca <cristian.petro...@gmail.com> > Related to the Stanford Dependency Tree Feature, this is the way the > output from the tool looks like for this sentence : "Mary and Tom met Danny > today" : > > > 2013/8/30 Cristian Petroaca <cristian.petro...@gmail.com> > >> Hi Rupert, >> >> Ok, so after looking at the JSON output from the Stanford NLP Server and >> the coref module I'm thinking I can represent the coreference information >> this way: >> Each "Token" or "Chunk" will contain an additional coref annotation with >> the following structure : >> >> "stanbol.enhancer.nlp.coref" { >> "tag" : //does this need to exist? >> "isRepresentative" : true/false, // whether this token or chunk is >> the representative mention in the chain >> "mentions" : [ { "sentenceNo" : 1 //the sentence in which the mention >> is found >> "startWord" : 2 //the first word making up the >> mention >> "endWord" : 3 //the last word making up the >> mention >> }, ... >> ], >> "class" : ""class" : "org.apache.stanbol.enhancer.nlp.coref.CorefTag" >> } >> >> The CorefTag should resemble this model. >> >> What do you think? >> >> Cristian >> >> >> 2013/8/24 Rupert Westenthaler <rupert.westentha...@gmail.com> >> >>> Hi Cristian, >>> >>> you can not directly call StanfordNLP components from Stanbol, but you >>> have to extend the RESTful service to include the information you >>> need. The main reason for that is that the license of StanfordNLP is >>> not compatible with the Apache Software License. So Stanbol can not >>> directly link to the StanfordNLP API. >>> >>> You will need to >>> >>> 1. define an additional class {yourTag} extends Tag<{yourType}> class >>> in the o.a.s.enhancer.nlp module >>> 2. add JSON parsing and serialization support for this tag to the >>> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an example) >>> >>> As (1) would be necessary anyway the only additional thing you need to >>> develop is (2). After that you can add {yourTag} instance to the >>> AnalyzedText in the StanfornNLP integration. The >>> RestfulNlpAnalysisEngine will parse them from the response. All >>> engines executed after the RestfulNlpAnalysisEngine will have access >>> to your annotations. >>> >>> If you have a design for {yourTag} - the model you would like to use >>> to represent your data - I can help with (1) and (2). >>> >>> best >>> Rupert >>> >>> >>> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca >>> <cristian.petro...@gmail.com> wrote: >>> > Hi Rupert, >>> > >>> > Thanks for the info. Looking at the standbol-stanfordnlp project I see >>> that >>> > the stanford nlp is not implemented as an EnhancementEngine but rather >>> it >>> > is used directly in a Jetty Server instance. How does that fit into the >>> > Stanbol stack? For example how can I call the StanfordNlpAnalyzer's >>> routine >>> > from my TripleExtractionEnhancementEngine which lives in the Stanbol >>> stack? >>> > >>> > Thanks, >>> > Cristian >>> > >>> > >>> > 2013/8/12 Rupert Westenthaler <rupert.westentha...@gmail.com> >>> > >>> >> Hi Cristian, >>> >> >>> >> Sorry for the late response, but I was offline for the last two weeks >>> >> >>> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca >>> >> <cristian.petro...@gmail.com> wrote: >>> >> > Hi Rupert, >>> >> > >>> >> > After doing some tests it seems that the Stanford NLP coreference >>> module >>> >> is >>> >> > much more accurate than the Open NLP one.So I decided to extend >>> Stanford >>> >> > NLP to add coreference there. >>> >> >>> >> The Stanford NLP integration is not part of the Stanbol codebase >>> >> because the licenses are not compatible. >>> >> >>> >> You can find the Stanford NLP integration on >>> >> >>> >> https://github.com/westei/stanbol-stanfordnlp >>> >> >>> >> just create a fork and send pull requests. >>> >> >>> >> >>> >> > Could you add the necessary projects on the branch? And also remove >>> the >>> >> > Open NLP ones? >>> >> > >>> >> >>> >> Currently the branch >>> >> >>> >> >>> >> >>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/ >>> >> >>> >> only contains the "nlp" and the "nlp-json" modules. IMO those should >>> >> be enough for adding coreference support. >>> >> >>> >> IMO you will need to >>> >> >>> >> * add an model for representing coreference to the nlp module >>> >> * add parsing and serializing support to the nlp-json module >>> >> * add the implementation to your fork of the stanbol-stanfordnlp >>> project >>> >> >>> >> best >>> >> Rupert >>> >> >>> >> >>> >> >>> >> > Thanks, >>> >> > Cristian >>> >> > >>> >> > >>> >> > 2013/7/5 Rupert Westenthaler <rupert.westentha...@gmail.com> >>> >> > >>> >> >> Hi Cristian, >>> >> >> >>> >> >> I created the branch at >>> >> >> >>> >> >> >>> >> >> >>> >> >>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/ >>> >> >> >>> >> >> ATM in contains only the "nlp" and "nlp-json" module. Let me know >>> if >>> >> >> you would like to have more >>> >> >> >>> >> >> best >>> >> >> Rupert >>> >> >> >>> >> >> >>> >> >> >>> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca >>> >> >> <cristian.petro...@gmail.com> wrote: >>> >> >> > Hi Rupert, >>> >> >> > >>> >> >> > I created jiras : >>> https://issues.apache.org/jira/browse/STANBOL-1132and >>> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. The >>> original one >>> >> in >>> >> >> > dependent upon these. >>> >> >> > Please let me know when I can start using the branch. >>> >> >> > >>> >> >> > Thanks, >>> >> >> > Cristian >>> >> >> > >>> >> >> > >>> >> >> > 2013/6/27 Cristian Petroaca <cristian.petro...@gmail.com> >>> >> >> > >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> 2013/6/27 Rupert Westenthaler <rupert.westentha...@gmail.com> >>> >> >> >> >>> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca >>> >> >> >>> <cristian.petro...@gmail.com> wrote: >>> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford in my >>> previous >>> >> >> e-mail. >>> >> >> >>> By >>> >> >> >>> > the way, does Open NLP have the ability to build dependency >>> trees? >>> >> >> >>> > >>> >> >> >>> >>> >> >> >>> AFAIK OpenNLP does not provide this feature. >>> >> >> >>> >>> >> >> >> >>> >> >> >> Then , since the Stanford NLP lib is also integrated into >>> Stanbol, >>> >> I'll >>> >> >> >> take a look at how I can extend its integration to include the >>> >> >> dependency >>> >> >> >> tree feature. >>> >> >> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >> > >>> >> >> >>> > 2013/6/23 Cristian Petroaca <cristian.petro...@gmail.com> >>> >> >> >>> > >>> >> >> >>> >> Hi Rupert, >>> >> >> >>> >> >>> >> >> >>> >> I created jira >>> >> https://issues.apache.org/jira/browse/STANBOL-1121. >>> >> >> >>> >> As you suggested I would start with extending the Stanford >>> NLP >>> >> with >>> >> >> >>> >> co-reference resolution but I think also with dependency >>> trees >>> >> >> because >>> >> >> >>> I >>> >> >> >>> >> also need to know the Subject of the sentence and the object >>> >> that it >>> >> >> >>> >> affects, right? >>> >> >> >>> >> >>> >> >> >>> >> Given that I need to extend the Stanford NLP API in Stanbol >>> for >>> >> >> >>> >> co-reference and dependency trees, how do I proceed with >>> this? >>> >> Do I >>> >> >> >>> create >>> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After that can I >>> >> start >>> >> >> >>> >> implementing on my local copy of Stanbol and when I'm done >>> I'll >>> >> send >>> >> >> >>> you >>> >> >> >>> >> guys the patch fo review? >>> >> >> >>> >> >>> >> >> >>> >>> >> >> >>> I would create two "New Feature" type Issues one for adding >>> support >>> >> >> >>> for "dependency trees" and the other for "co-reference" >>> support. You >>> >> >> >>> should also define "depends on" relations between STANBOL-1121 >>> and >>> >> >> >>> those two new issues. >>> >> >> >>> >>> >> >> >>> Sub-task could also work, but as adding those features would >>> be also >>> >> >> >>> interesting for other things I would rather define them as >>> separate >>> >> >> >>> issues. >>> >> >> >>> >>> >> >> >>> >>> >> >> >> 2 New Features connected with the original jira it is then. >>> >> >> >> >>> >> >> >> >>> >> >> >>> If you would prefer to work in an own branch please tell me. >>> This >>> >> >> >>> could have the advantage that patches would not be affected by >>> >> changes >>> >> >> >>> in the trunk. >>> >> >> >>> >>> >> >> >>> Yes, a separate branch sounds good. >>> >> >> >> >>> >> >> >> best >>> >> >> >>> Rupert >>> >> >> >>> >>> >> >> >>> >> Regards, >>> >> >> >>> >> Cristian >>> >> >> >>> >> >>> >> >> >>> >> >>> >> >> >>> >> 2013/6/18 Rupert Westenthaler < >>> rupert.westentha...@gmail.com> >>> >> >> >>> >> >>> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca >>> >> >> >>> >>> <cristian.petro...@gmail.com> wrote: >>> >> >> >>> >>> > Hi Rupert, >>> >> >> >>> >>> > >>> >> >> >>> >>> > Agreed on the >>> >> >> >>> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation >>> >> >> >>> >>> > data structure. >>> >> >> >>> >>> > >>> >> >> >>> >>> > Should I open up a Jira for all of this in order to >>> >> encapsulate >>> >> >> this >>> >> >> >>> >>> > information and establish the goals and these initial >>> steps >>> >> >> towards >>> >> >> >>> >>> these >>> >> >> >>> >>> > goals? >>> >> >> >>> >>> >>> >> >> >>> >>> Yes please. A JIRA issue for this work would be great. >>> >> >> >>> >>> >>> >> >> >>> >>> > How should I proceed further? Should I create some design >>> >> >> documents >>> >> >> >>> that >>> >> >> >>> >>> > need to be reviewed? >>> >> >> >>> >>> >>> >> >> >>> >>> Usually it is the best to write design related text >>> directly in >>> >> >> JIRA >>> >> >> >>> >>> by using Markdown [1] syntax. This will allow us later to >>> use >>> >> this >>> >> >> >>> >>> text directly for the documentation on the Stanbol Webpage. >>> >> >> >>> >>> >>> >> >> >>> >>> best >>> >> >> >>> >>> Rupert >>> >> >> >>> >>> >>> >> >> >>> >>> >>> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/ >>> >> >> >>> >>> > >>> >> >> >>> >>> > Regards, >>> >> >> >>> >>> > Cristian >>> >> >> >>> >>> > >>> >> >> >>> >>> > >>> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler < >>> rupert.westentha...@gmail.com> >>> >> >> >>> >>> > >>> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian Petroaca >>> >> >> >>> >>> >> <cristian.petro...@gmail.com> wrote: >>> >> >> >>> >>> >> > HI Rupert, >>> >> >> >>> >>> >> > >>> >> >> >>> >>> >> > First of all thanks for the detailed suggestions. >>> >> >> >>> >>> >> > >>> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler < >>> >> rupert.westentha...@gmail.com> >>> >> >> >>> >>> >> > >>> >> >> >>> >>> >> >> Hi Cristian, all >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> really interesting use case! >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> In this mail I will try to give some suggestions on >>> how >>> >> this >>> >> >> >>> could >>> >> >> >>> >>> >> >> work out. This suggestions are mainly based on >>> experiences >>> >> >> and >>> >> >> >>> >>> lessons >>> >> >> >>> >>> >> >> learned in the LIVE [2] project where we built an >>> >> information >>> >> >> >>> system >>> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this Project >>> >> excluded >>> >> >> the >>> >> >> >>> >>> >> >> extraction of Events from unstructured text (because >>> the >>> >> >> Olympic >>> >> >> >>> >>> >> >> Information System was already providing event data >>> as XML >>> >> >> >>> messages) >>> >> >> >>> >>> >> >> the semantic search capabilities of this system >>> where very >>> >> >> >>> similar >>> >> >> >>> >>> as >>> >> >> >>> >>> >> >> the one described by your use case. >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> IMHO you are not only trying to extract relations, >>> but a >>> >> >> formal >>> >> >> >>> >>> >> >> representation of the situation described by the >>> text. So >>> >> >> lets >>> >> >> >>> >>> assume >>> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or Situation) >>> >> >> described >>> >> >> >>> in >>> >> >> >>> >>> the >>> >> >> >>> >>> >> >> text - a fise:SettingAnnotation. >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives some >>> advices on >>> >> >> how to >>> >> >> >>> >>> model >>> >> >> >>> >>> >> >> those. The important relation for modeling this >>> >> >> Participation: >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t)) >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> where .. >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> * ED are Endurants (continuants): Endurants do have >>> an >>> >> >> >>> identity so >>> >> >> >>> >>> we >>> >> >> >>> >>> >> >> would typically refer to them as Entities referenced >>> by a >>> >> >> >>> setting. >>> >> >> >>> >>> >> >> Note that this includes physical, non-physical as >>> well as >>> >> >> >>> >>> >> >> social-objects. >>> >> >> >>> >>> >> >> * PD are Perdurants (occurrents): Perdurants are >>> >> entities >>> >> >> that >>> >> >> >>> >>> >> >> happen in time. This refers to Events, Activities ... >>> >> >> >>> >>> >> >> * PC are Participation: It is an time indexed >>> relation >>> >> where >>> >> >> >>> >>> >> >> Endurants participate in Perdurants >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> Modeling this in RDF requires to define some >>> intermediate >>> >> >> >>> resources >>> >> >> >>> >>> >> >> because RDF does not allow for n-ary relations. >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> * fise:SettingAnnotation: It is really handy to >>> define >>> >> one >>> >> >> >>> resource >>> >> >> >>> >>> >> >> being the context for all described data. I would >>> call >>> >> this >>> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a >>> sub-concept to >>> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement about the >>> >> extracted >>> >> >> >>> >>> Setting >>> >> >> >>> >>> >> >> would define a "fise:in-setting" relation to it. >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> * fise:ParticipantAnnotation: Is used to annotate >>> that >>> >> >> >>> Endurant is >>> >> >> >>> >>> >> >> participating on a setting (fise:in-setting >>> >> >> >>> fise:SettingAnnotation). >>> >> >> >>> >>> >> >> The Endurant itself is described by existing >>> >> >> fise:TextAnnotaion >>> >> >> >>> (the >>> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation (suggested >>> Entities). >>> >> >> >>> Basically >>> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow an >>> >> >> EnhancementEngine >>> >> >> >>> to >>> >> >> >>> >>> >> >> state that several mentions (in possible different >>> >> >> sentences) do >>> >> >> >>> >>> >> >> represent the same Endurant as participating in the >>> >> Setting. >>> >> >> In >>> >> >> >>> >>> >> >> addition it would be possible to use the dc:type >>> property >>> >> >> >>> (similar >>> >> >> >>> >>> as >>> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the role(s) of >>> an >>> >> >> >>> participant >>> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally performs an >>> action) >>> >> Cause >>> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), Patient (a >>> passive >>> >> role >>> >> >> in >>> >> >> >>> an >>> >> >> >>> >>> >> >> activity) and Instrument (aids an process)), but I am >>> >> >> wondering >>> >> >> >>> if >>> >> >> >>> >>> one >>> >> >> >>> >>> >> >> could extract those information. >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to annotate a >>> >> Perdurant >>> >> >> in >>> >> >> >>> the >>> >> >> >>> >>> >> >> context of the Setting. Also >>> fise:OccurrentAnnotation can >>> >> >> link >>> >> >> >>> to >>> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the text >>> defining >>> >> the >>> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation >>> suggesting >>> >> well >>> >> >> >>> known >>> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election in a >>> country, >>> >> or >>> >> >> an >>> >> >> >>> >>> >> >> upraising ...). In addition fise:OccurrentAnnotation >>> can >>> >> >> define >>> >> >> >>> >>> >> >> dc:has-participant links to >>> fise:ParticipantAnnotation. In >>> >> >> this >>> >> >> >>> case >>> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant (the >>> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in this >>> Perturant >>> >> (the >>> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences are >>> temporal >>> >> >> indexed >>> >> >> >>> this >>> >> >> >>> >>> >> >> annotation should also support properties for >>> defining the >>> >> >> >>> >>> >> >> xsd:dateTime for the start/end. >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> Indeed, an event based data structure makes a lot of >>> sense >>> >> >> with >>> >> >> >>> the >>> >> >> >>> >>> >> remark >>> >> >> >>> >>> >> > that you probably won't be able to always extract the >>> date >>> >> >> for a >>> >> >> >>> >>> given >>> >> >> >>> >>> >> > setting(situation). >>> >> >> >>> >>> >> > There are 2 thing which are unclear though. >>> >> >> >>> >>> >> > >>> >> >> >>> >>> >> > 1. Perdurant : You could have situations in which the >>> >> object >>> >> >> upon >>> >> >> >>> >>> which >>> >> >> >>> >>> >> the >>> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a transitory >>> >> object ( >>> >> >> >>> such >>> >> >> >>> >>> as an >>> >> >> >>> >>> >> > event, activity ) but rather another Endurant. For >>> example >>> >> we >>> >> >> can >>> >> >> >>> >>> have >>> >> >> >>> >>> >> the >>> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the Endurant >>> ( >>> >> >> Subject ) >>> >> >> >>> >>> which >>> >> >> >>> >>> >> > performs the action of "invading" on another >>> Eundurant, >>> >> namely >>> >> >> >>> >>> "Irak". >>> >> >> >>> >>> >> > >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq the >>> Patient. >>> >> Both >>> >> >> >>> are >>> >> >> >>> >>> >> Endurants. The activity "invading" would be the >>> Perdurant. So >>> >> >> >>> ideally >>> >> >> >>> >>> >> you would have a "fise:SettingAnnotation" with: >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> * fise:ParticipantAnnotation for USA with the dc:type >>> >> >> caos:Agent, >>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" and a >>> >> >> >>> fise:EntityAnnotation >>> >> >> >>> >>> >> linking to dbpedia:United_States >>> >> >> >>> >>> >> * fise:ParticipantAnnotation for Iraq with the dc:type >>> >> >> >>> caos:Patient, >>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" and a >>> >> >> >>> >>> >> fise:EntityAnnotation linking to dbpedia:Iraq >>> >> >> >>> >>> >> * fise:OccurrentAnnotation for "invades" with the >>> dc:type >>> >> >> >>> >>> >> caos:Activity, linking to a fise:TextAnnotation for >>> "invades" >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> > 2. Where does the verb, which links the Subject and >>> the >>> >> Object >>> >> >> >>> come >>> >> >> >>> >>> into >>> >> >> >>> >>> >> > this? I imagined that the Endurant would have a >>> >> dc:"property" >>> >> >> >>> where >>> >> >> >>> >>> the >>> >> >> >>> >>> >> > property = verb which links to the Object in noun >>> form. For >>> >> >> >>> example >>> >> >> >>> >>> take >>> >> >> >>> >>> >> > again the sentence "USA invades Irak". You would have >>> the >>> >> >> "USA" >>> >> >> >>> >>> Entity >>> >> >> >>> >>> >> with >>> >> >> >>> >>> >> > dc:invader which points to the Object "Irak". The >>> Endurant >>> >> >> would >>> >> >> >>> >>> have as >>> >> >> >>> >>> >> > many dc:"property" elements as there are verbs which >>> link >>> >> it >>> >> >> to >>> >> >> >>> an >>> >> >> >>> >>> >> Object. >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> As explained above you would have a >>> fise:OccurrentAnnotation >>> >> >> that >>> >> >> >>> >>> >> represents the Perdurant. The information that the >>> activity >>> >> >> >>> mention in >>> >> >> >>> >>> >> the text is "invades" would be by linking to a >>> >> >> >>> fise:TextAnnotation. If >>> >> >> >>> >>> >> you can also provide an Ontology for Tasks that defines >>> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation could >>> also link >>> >> >> to an >>> >> >> >>> >>> >> fise:EntityAnnotation for this concept. >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> best >>> >> >> >>> >>> >> Rupert >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> > >>> >> >> >>> >>> >> > ### Consuming the data: >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> I think this model should be sufficient for >>> use-cases as >>> >> >> >>> described >>> >> >> >>> >>> by >>> >> >> >>> >>> >> you. >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> Users would be able to consume data on the setting >>> level. >>> >> >> This >>> >> >> >>> can >>> >> >> >>> >>> be >>> >> >> >>> >>> >> >> done my simple retrieving all >>> fise:ParticipantAnnotation >>> >> as >>> >> >> >>> well as >>> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a setting. BTW >>> this >>> >> was >>> >> >> the >>> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic search. It >>> allows >>> >> >> >>> queries for >>> >> >> >>> >>> >> >> Settings that involve specific Entities e.g. you >>> could >>> >> filter >>> >> >> >>> for >>> >> >> >>> >>> >> >> Settings that involve a {Person}, >>> activities:Arrested and >>> >> a >>> >> >> >>> specific >>> >> >> >>> >>> >> >> {Upraising}. However note that with this approach >>> you will >>> >> >> get >>> >> >> >>> >>> results >>> >> >> >>> >>> >> >> for Setting where the {Person} participated and an >>> other >>> >> >> person >>> >> >> >>> was >>> >> >> >>> >>> >> >> arrested. >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> An other possibility would be to process enhancement >>> >> results >>> >> >> on >>> >> >> >>> the >>> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow to a much >>> >> higher >>> >> >> >>> >>> >> >> granularity level (e.g. it would allow to correctly >>> answer >>> >> >> the >>> >> >> >>> query >>> >> >> >>> >>> >> >> used as an example above). But I am wondering if the >>> >> quality >>> >> >> of >>> >> >> >>> the >>> >> >> >>> >>> >> >> Setting extraction will be sufficient for this. I >>> have >>> >> also >>> >> >> >>> doubts >>> >> >> >>> >>> if >>> >> >> >>> >>> >> >> this can be still realized by using semantic >>> indexing to >>> >> >> Apache >>> >> >> >>> Solr >>> >> >> >>> >>> >> >> or if it would be better/necessary to store results >>> in a >>> >> >> >>> TripleStore >>> >> >> >>> >>> >> >> and using SPARQL for retrieval. >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> The methodology and query language used by YAGO [3] >>> is >>> >> also >>> >> >> very >>> >> >> >>> >>> >> >> relevant for this (especially note chapter 7 SPOTL(X) >>> >> >> >>> >>> Representation). >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> An other related Topic is the enrichment of Entities >>> >> >> (especially >>> >> >> >>> >>> >> >> Events) in knowledge bases based on Settings >>> extracted >>> >> form >>> >> >> >>> >>> Documents. >>> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants are >>> temporal >>> >> >> indexed. >>> >> >> >>> That >>> >> >> >>> >>> >> >> means that at the time when added to a knowledge >>> base they >>> >> >> might >>> >> >> >>> >>> still >>> >> >> >>> >>> >> >> be in process. So the creation, enriching and >>> refinement >>> >> of >>> >> >> such >>> >> >> >>> >>> >> >> Entities in a the knowledge base seams to be >>> critical for >>> >> a >>> >> >> >>> System >>> >> >> >>> >>> >> >> like described in your use-case. >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian Petroaca >>> >> >> >>> >>> >> >> <cristian.petro...@gmail.com> wrote: >>> >> >> >>> >>> >> >> > >>> >> >> >>> >>> >> >> > First of all I have to mention that I am new in the >>> >> field >>> >> >> of >>> >> >> >>> >>> semantic >>> >> >> >>> >>> >> >> > technologies, I've started to read about them in >>> the >>> >> last >>> >> >> 4-5 >>> >> >> >>> >>> >> >> months.Having >>> >> >> >>> >>> >> >> > said that I have a high level overview of what is >>> a good >>> >> >> >>> approach >>> >> >> >>> >>> to >>> >> >> >>> >>> >> >> solve >>> >> >> >>> >>> >> >> > this problem. There are a number of papers on the >>> >> internet >>> >> >> >>> which >>> >> >> >>> >>> >> describe >>> >> >> >>> >>> >> >> > what steps need to be taken such as : named entity >>> >> >> >>> recognition, >>> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and others. >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> The Stanbol NLP processing module currently only >>> supports >>> >> >> >>> sentence >>> >> >> >>> >>> >> >> detection, tokenization, POS tagging, Chunking, NER >>> and >>> >> >> lemma. >>> >> >> >>> >>> support >>> >> >> >>> >>> >> >> for co-reference resolution and dependency trees is >>> >> currently >>> >> >> >>> >>> missing. >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> Stanford NLP is already integrated with Stanbol [4]. >>> At >>> >> the >>> >> >> >>> moment >>> >> >> >>> >>> it >>> >> >> >>> >>> >> >> only supports English, but I do already work to >>> include >>> >> the >>> >> >> >>> other >>> >> >> >>> >>> >> >> supported languages. Other NLP framework that is >>> already >>> >> >> >>> integrated >>> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane [6]. But >>> note >>> >> >> that >>> >> >> >>> for >>> >> >> >>> >>> all >>> >> >> >>> >>> >> >> those the integration excludes support for >>> co-reference >>> >> and >>> >> >> >>> >>> dependency >>> >> >> >>> >>> >> >> trees. >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> Anyways I am confident that one can implement a first >>> >> >> prototype >>> >> >> >>> by >>> >> >> >>> >>> >> >> only using Sentences and POS tags and - if available >>> - >>> >> Chunks >>> >> >> >>> (e.g. >>> >> >> >>> >>> >> >> Noun phrases). >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> > I assume that in the Stanbol context, a feature like >>> >> Relation >>> >> >> >>> >>> extraction >>> >> >> >>> >>> >> > would be implemented as an EnhancementEngine? >>> >> >> >>> >>> >> > What kind of effort would be required for a >>> co-reference >>> >> >> >>> resolution >>> >> >> >>> >>> tool >>> >> >> >>> >>> >> > integration into Stanbol? >>> >> >> >>> >>> >> > >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> Yes in the end it would be an EnhancementEngine. But >>> before >>> >> we >>> >> >> can >>> >> >> >>> >>> >> build such an engine we would need to >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> * extend the Stanbol NLP processing API with >>> Annotations for >>> >> >> >>> >>> co-reference >>> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing for those >>> >> >> annotation >>> >> >> >>> so >>> >> >> >>> >>> >> that the RESTful NLP Analysis Service can provide >>> >> co-reference >>> >> >> >>> >>> >> information >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> > At this moment I'll be focusing on 2 aspects: >>> >> >> >>> >>> >> > >>> >> >> >>> >>> >> > 1. Determine the best data structure to encapsulate >>> the >>> >> >> extracted >>> >> >> >>> >>> >> > information. I'll take a closer look at Dolce. >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> Don't make to to complex. Defining a proper structure to >>> >> >> represent >>> >> >> >>> >>> >> Events will only pay-off if we can also successfully >>> extract >>> >> >> such >>> >> >> >>> >>> >> information form processed texts. >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> I would start with >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> * fise:SettingAnnotation >>> >> >> >>> >>> >> * {fise:Enhancement} metadata >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> * fise:ParticipantAnnotation >>> >> >> >>> >>> >> * {fise:Enhancement} metadata >>> >> >> >>> >>> >> * fise:inSetting {settingAnnotation} >>> >> >> >>> >>> >> * fise:hasMention {textAnnotation} >>> >> >> >>> >>> >> * fise:suggestion {entityAnnotation} (multiple if >>> there >>> >> are >>> >> >> >>> more >>> >> >> >>> >>> >> suggestions) >>> >> >> >>> >>> >> * dc:type one of fise:Agent, fise:Patient, >>> >> fise:Instrument, >>> >> >> >>> >>> fise:Cause >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> * fise:OccurrentAnnotation >>> >> >> >>> >>> >> * {fise:Enhancement} metadata >>> >> >> >>> >>> >> * fise:inSetting {settingAnnotation} >>> >> >> >>> >>> >> * fise:hasMention {textAnnotation} >>> >> >> >>> >>> >> * dc:type set to fise:Activity >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> If it turns out that we can extract more, we can add >>> more >>> >> >> >>> structure to >>> >> >> >>> >>> >> those annotations. We might also think about using an >>> own >>> >> >> namespace >>> >> >> >>> >>> >> for those extensions to the annotation structure. >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> > 2. Determine how should all of this be integrated into >>> >> >> Stanbol. >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> Just create an EventExtractionEngine and configure a >>> >> enhancement >>> >> >> >>> chain >>> >> >> >>> >>> >> that does NLP processing and EntityLinking. >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> You should have a look at >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does a lot of >>> things >>> >> >> with >>> >> >> >>> NLP >>> >> >> >>> >>> >> processing results (e.g. connecting adjectives (via >>> verbs) to >>> >> >> >>> >>> >> nouns/pronouns. So as long we can not use explicit >>> dependency >>> >> >> trees >>> >> >> >>> >>> >> you code will need to do similar things with Nouns, >>> Pronouns >>> >> and >>> >> >> >>> >>> >> Verbs. >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a Java >>> >> >> representation >>> >> >> >>> of >>> >> >> >>> >>> >> present fise:TextAnnotation and fise:EntityAnnotation >>> [2]. >>> >> >> >>> Something >>> >> >> >>> >>> >> similar will also be required by the >>> EventExtractionEngine >>> >> for >>> >> >> fast >>> >> >> >>> >>> >> access to such annotations while iterating over the >>> >> Sentences of >>> >> >> >>> the >>> >> >> >>> >>> >> text. >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> best >>> >> >> >>> >>> >> Rupert >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> [1] >>> >> >> >>> >>> >> >>> >> >> >>> >>> >>> >> >> >>> >>> >> >> >>> >> >>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java >>> >> >> >>> >>> >> [2] >>> >> >> >>> >>> >> >>> >> >> >>> >>> >>> >> >> >>> >>> >> >> >>> >> >>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> > >>> >> >> >>> >>> >> > Thanks >>> >> >> >>> >>> >> > >>> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion >>> >> >> >>> >>> >> >> best >>> >> >> >>> >>> >> >> Rupert >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >> -- >>> >> >> >>> >>> >> >> | Rupert Westenthaler >>> >> >> rupert.westentha...@gmail.com >>> >> >> >>> >>> >> >> | Bodenlehenstraße 11 >>> >> >> >>> ++43-699-11108907 >>> >> >> >>> >>> >> >> | A-5500 Bischofshofen >>> >> >> >>> >>> >> >> >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> >>> >> >> >>> >>> >> -- >>> >> >> >>> >>> >> | Rupert Westenthaler >>> >> rupert.westentha...@gmail.com >>> >> >> >>> >>> >> | Bodenlehenstraße 11 >>> >> >> >>> ++43-699-11108907 >>> >> >> >>> >>> >> | A-5500 Bischofshofen >>> >> >> >>> >>> >> >>> >> >> >>> >>> >>> >> >> >>> >>> >>> >> >> >>> >>> >>> >> >> >>> >>> -- >>> >> >> >>> >>> | Rupert Westenthaler >>> rupert.westentha...@gmail.com >>> >> >> >>> >>> | Bodenlehenstraße 11 >>> >> >> ++43-699-11108907 >>> >> >> >>> >>> | A-5500 Bischofshofen >>> >> >> >>> >>> >>> >> >> >>> >> >>> >> >> >>> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> -- >>> >> >> >>> | Rupert Westenthaler >>> rupert.westentha...@gmail.com >>> >> >> >>> | Bodenlehenstraße 11 >>> ++43-699-11108907 >>> >> >> >>> | A-5500 Bischofshofen >>> >> >> >>> >>> >> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> -- >>> >> >> | Rupert Westenthaler rupert.westentha...@gmail.com >>> >> >> | Bodenlehenstraße 11 >>> ++43-699-11108907 >>> >> >> | A-5500 Bischofshofen >>> >> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> | Rupert Westenthaler rupert.westentha...@gmail.com >>> >> | Bodenlehenstraße 11 ++43-699-11108907 >>> >> | A-5500 Bischofshofen >>> >> >>> >>> >>> >>> -- >>> | Rupert Westenthaler rupert.westentha...@gmail.com >>> | Bodenlehenstraße 11 ++43-699-11108907 >>> | A-5500 Bischofshofen >>> >> >> >