Re: Relation extraction feature

Cristian Petroaca Sun, 01 Sep 2013 11:10:40 -0700

Sorry, pressed sent too soon :).

Continued :


nsubj(met-4, Mary-1), conj_and(Mary-1, Tom-3), nsubj(met-4, Tom-3),
root(ROOT-0, met-4), nn(today-6, Danny-5), tmod(met-4, today-6)]

Given this, we can have for each "Token" an additional dependency
annotation :

"stanbol.enhancer.nlp.dependency" : {
"tag" : //is it necessary?
"relations" : [ { "type" : "nsubj", //type of relation
  "role" : "gov/dep", //whether it is depender or the dependee
  "dependencyValue" : "met", // the word with which the token has a relation
  "dependencyIndexInSentence" : "2" //the index of the dependency in the
current sentence
}
...
]
                "class" :
"org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
        }

2013/9/1 Cristian Petroaca <cristian.petro...@gmail.com>

> Related to the Stanford Dependency Tree Feature, this is the way the
> output from the tool looks like for this sentence : "Mary and Tom met Danny
> today" :
>
>
> 2013/8/30 Cristian Petroaca <cristian.petro...@gmail.com>
>
>> Hi Rupert,
>>
>> Ok, so after looking at the JSON output from the Stanford NLP Server and
>> the coref module I'm thinking I can represent the coreference information
>> this way:
>> Each "Token" or "Chunk" will contain an additional coref annotation with
>> the following structure :
>>
>> "stanbol.enhancer.nlp.coref" {
>>     "tag" : //does this need to exist?
>>     "isRepresentative" : true/false, // whether this token or chunk is
>> the representative mention in the chain
>>     "mentions" : [ { "sentenceNo" : 1 //the sentence in which the mention
>> is found
>>                            "startWord" : 2 //the first word making up the
>> mention
>>                            "endWord" : 3 //the last word making up the
>> mention
>>                          }, ...
>>                        ],
>>     "class" : ""class" : "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
>> }
>>
>> The CorefTag should resemble this model.
>>
>> What do you think?
>>
>> Cristian
>>
>>
>> 2013/8/24 Rupert Westenthaler <rupert.westentha...@gmail.com>
>>
>>> Hi Cristian,
>>>
>>> you can not directly call StanfordNLP components from Stanbol, but you
>>> have to extend the RESTful service to include the information you
>>> need. The main reason for that is that the license of StanfordNLP is
>>> not compatible with the Apache Software License. So Stanbol can not
>>> directly link to the StanfordNLP API.
>>>
>>> You will need to
>>>
>>> 1. define an additional class {yourTag} extends Tag<{yourType}> class
>>> in the o.a.s.enhancer.nlp module
>>> 2. add JSON parsing and serialization support for this tag to the
>>> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an example)
>>>
>>> As (1) would be necessary anyway the only additional thing you need to
>>> develop is (2). After that you can add {yourTag} instance to the
>>> AnalyzedText in the StanfornNLP integration. The
>>> RestfulNlpAnalysisEngine will parse them from the response. All
>>> engines executed after the RestfulNlpAnalysisEngine will have access
>>> to your annotations.
>>>
>>> If you have a design for {yourTag} - the model you would like to use
>>> to represent your data - I can help with (1) and (2).
>>>
>>> best
>>> Rupert
>>>
>>>
>>> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca
>>> <cristian.petro...@gmail.com> wrote:
>>> > Hi Rupert,
>>> >
>>> > Thanks for the info. Looking at the standbol-stanfordnlp project I see
>>> that
>>> > the stanford nlp is not implemented as an EnhancementEngine but rather
>>> it
>>> > is used directly in a Jetty Server instance. How does that fit into the
>>> > Stanbol stack? For example how can I call the StanfordNlpAnalyzer's
>>> routine
>>> > from my TripleExtractionEnhancementEngine which lives in the Stanbol
>>> stack?
>>> >
>>> > Thanks,
>>> > Cristian
>>> >
>>> >
>>> > 2013/8/12 Rupert Westenthaler <rupert.westentha...@gmail.com>
>>> >
>>> >> Hi Cristian,
>>> >>
>>> >> Sorry for the late response, but I was offline for the last two weeks
>>> >>
>>> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca
>>> >> <cristian.petro...@gmail.com> wrote:
>>> >> > Hi Rupert,
>>> >> >
>>> >> > After doing some tests it seems that the Stanford NLP coreference
>>> module
>>> >> is
>>> >> > much more accurate than the Open NLP one.So I decided to extend
>>> Stanford
>>> >> > NLP to add coreference there.
>>> >>
>>> >> The Stanford NLP integration is not part of the Stanbol codebase
>>> >> because the licenses are not compatible.
>>> >>
>>> >> You can find the Stanford NLP integration on
>>> >>
>>> >>     https://github.com/westei/stanbol-stanfordnlp
>>> >>
>>> >> just create a fork and send pull requests.
>>> >>
>>> >>
>>> >> > Could you add the necessary projects on the branch? And also remove
>>> the
>>> >> > Open NLP ones?
>>> >> >
>>> >>
>>> >> Currently the branch
>>> >>
>>> >>
>>> >>
>>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>>> >>
>>> >> only contains the "nlp" and the "nlp-json" modules. IMO those should
>>> >> be enough for adding coreference support.
>>> >>
>>> >> IMO you will need to
>>> >>
>>> >> * add an model for representing coreference to the nlp module
>>> >> * add parsing and serializing support to the nlp-json module
>>> >> * add the implementation to your fork of the stanbol-stanfordnlp
>>> project
>>> >>
>>> >> best
>>> >> Rupert
>>> >>
>>> >>
>>> >>
>>> >> > Thanks,
>>> >> > Cristian
>>> >> >
>>> >> >
>>> >> > 2013/7/5 Rupert Westenthaler <rupert.westentha...@gmail.com>
>>> >> >
>>> >> >> Hi Cristian,
>>> >> >>
>>> >> >> I created the branch at
>>> >> >>
>>> >> >>
>>> >> >>
>>> >>
>>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>>> >> >>
>>> >> >> ATM in contains only the "nlp" and "nlp-json" module. Let me know
>>> if
>>> >> >> you would like to have more
>>> >> >>
>>> >> >> best
>>> >> >> Rupert
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca
>>> >> >> <cristian.petro...@gmail.com> wrote:
>>> >> >> > Hi Rupert,
>>> >> >> >
>>> >> >> > I created jiras :
>>> https://issues.apache.org/jira/browse/STANBOL-1132and
>>> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. The
>>> original one
>>> >> in
>>> >> >> > dependent upon these.
>>> >> >> > Please let me know when I can start using the branch.
>>> >> >> >
>>> >> >> > Thanks,
>>> >> >> > Cristian
>>> >> >> >
>>> >> >> >
>>> >> >> > 2013/6/27 Cristian Petroaca <cristian.petro...@gmail.com>
>>> >> >> >
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> 2013/6/27 Rupert Westenthaler <rupert.westentha...@gmail.com>
>>> >> >> >>
>>> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca
>>> >> >> >>> <cristian.petro...@gmail.com> wrote:
>>> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford in my
>>> previous
>>> >> >> e-mail.
>>> >> >> >>> By
>>> >> >> >>> > the way, does Open NLP have the ability to build dependency
>>> trees?
>>> >> >> >>> >
>>> >> >> >>>
>>> >> >> >>> AFAIK OpenNLP does not provide this feature.
>>> >> >> >>>
>>> >> >> >>
>>> >> >> >> Then , since the Stanford NLP lib is also integrated into
>>> Stanbol,
>>> >> I'll
>>> >> >> >> take a look at how I can extend its integration to include the
>>> >> >> dependency
>>> >> >> >> tree feature.
>>> >> >> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >> >>  >
>>> >> >> >>> > 2013/6/23 Cristian Petroaca <cristian.petro...@gmail.com>
>>> >> >> >>> >
>>> >> >> >>> >> Hi Rupert,
>>> >> >> >>> >>
>>> >> >> >>> >> I created jira
>>> >> https://issues.apache.org/jira/browse/STANBOL-1121.
>>> >> >> >>> >> As you suggested I would start with extending the Stanford
>>> NLP
>>> >> with
>>> >> >> >>> >> co-reference resolution but I think also with dependency
>>> trees
>>> >> >> because
>>> >> >> >>> I
>>> >> >> >>> >> also need to know the Subject of the sentence and the object
>>> >> that it
>>> >> >> >>> >> affects, right?
>>> >> >> >>> >>
>>> >> >> >>> >> Given that I need to extend the Stanford NLP API in Stanbol
>>> for
>>> >> >> >>> >> co-reference and dependency trees, how do I proceed with
>>> this?
>>> >> Do I
>>> >> >> >>> create
>>> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After that can I
>>> >> start
>>> >> >> >>> >> implementing on my local copy of Stanbol and when I'm done
>>> I'll
>>> >> send
>>> >> >> >>> you
>>> >> >> >>> >> guys the patch fo review?
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>> I would create two "New Feature" type Issues one for adding
>>> support
>>> >> >> >>> for "dependency trees" and the other for "co-reference"
>>> support. You
>>> >> >> >>> should also define "depends on" relations between STANBOL-1121
>>> and
>>> >> >> >>> those two new issues.
>>> >> >> >>>
>>> >> >> >>> Sub-task could also work, but as adding those features would
>>> be also
>>> >> >> >>> interesting for other things I would rather define them as
>>> separate
>>> >> >> >>> issues.
>>> >> >> >>>
>>> >> >> >>>
>>> >> >> >> 2 New Features connected with the original jira it is then.
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>> If you would prefer to work in an own branch please tell me.
>>> This
>>> >> >> >>> could have the advantage that patches would not be affected by
>>> >> changes
>>> >> >> >>> in the trunk.
>>> >> >> >>>
>>> >> >> >>> Yes, a separate branch sounds good.
>>> >> >> >>
>>> >> >> >> best
>>> >> >> >>> Rupert
>>> >> >> >>>
>>> >> >> >>> >> Regards,
>>> >> >> >>> >> Cristian
>>> >> >> >>> >>
>>> >> >> >>> >>
>>> >> >> >>> >> 2013/6/18 Rupert Westenthaler <
>>> rupert.westentha...@gmail.com>
>>> >> >> >>> >>
>>> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca
>>> >> >> >>> >>> <cristian.petro...@gmail.com> wrote:
>>> >> >> >>> >>> > Hi Rupert,
>>> >> >> >>> >>> >
>>> >> >> >>> >>> > Agreed on the
>>> >> >> >>> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation
>>> >> >> >>> >>> > data structure.
>>> >> >> >>> >>> >
>>> >> >> >>> >>> > Should I open up a Jira for all of this in order to
>>> >> encapsulate
>>> >> >> this
>>> >> >> >>> >>> > information and establish the goals and these initial
>>> steps
>>> >> >> towards
>>> >> >> >>> >>> these
>>> >> >> >>> >>> > goals?
>>> >> >> >>> >>>
>>> >> >> >>> >>> Yes please. A JIRA issue for this work would be great.
>>> >> >> >>> >>>
>>> >> >> >>> >>> > How should I proceed further? Should I create some design
>>> >> >> documents
>>> >> >> >>> that
>>> >> >> >>> >>> > need to be reviewed?
>>> >> >> >>> >>>
>>> >> >> >>> >>> Usually it is the best to write design related text
>>> directly in
>>> >> >> JIRA
>>> >> >> >>> >>> by using Markdown [1] syntax. This will allow us later to
>>> use
>>> >> this
>>> >> >> >>> >>> text directly for the documentation on the Stanbol Webpage.
>>> >> >> >>> >>>
>>> >> >> >>> >>> best
>>> >> >> >>> >>> Rupert
>>> >> >> >>> >>>
>>> >> >> >>> >>>
>>> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/
>>> >> >> >>> >>> >
>>> >> >> >>> >>> > Regards,
>>> >> >> >>> >>> > Cristian
>>> >> >> >>> >>> >
>>> >> >> >>> >>> >
>>> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler <
>>> rupert.westentha...@gmail.com>
>>> >> >> >>> >>> >
>>> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian Petroaca
>>> >> >> >>> >>> >> <cristian.petro...@gmail.com> wrote:
>>> >> >> >>> >>> >> > HI Rupert,
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >> > First of all thanks for the detailed suggestions.
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler <
>>> >> rupert.westentha...@gmail.com>
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >> >> Hi Cristian, all
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> really interesting use case!
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> In this mail I will try to give some suggestions on
>>> how
>>> >> this
>>> >> >> >>> could
>>> >> >> >>> >>> >> >> work out. This suggestions are mainly based on
>>> experiences
>>> >> >> and
>>> >> >> >>> >>> lessons
>>> >> >> >>> >>> >> >> learned in the LIVE [2] project where we built an
>>> >> information
>>> >> >> >>> system
>>> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this Project
>>> >> excluded
>>> >> >> the
>>> >> >> >>> >>> >> >> extraction of Events from unstructured text (because
>>> the
>>> >> >> Olympic
>>> >> >> >>> >>> >> >> Information System was already providing event data
>>> as XML
>>> >> >> >>> messages)
>>> >> >> >>> >>> >> >> the semantic search capabilities of this system
>>> where very
>>> >> >> >>> similar
>>> >> >> >>> >>> as
>>> >> >> >>> >>> >> >> the one described by your use case.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> IMHO you are not only trying to extract relations,
>>> but a
>>> >> >> formal
>>> >> >> >>> >>> >> >> representation of the situation described by the
>>> text. So
>>> >> >> lets
>>> >> >> >>> >>> assume
>>> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or Situation)
>>> >> >> described
>>> >> >> >>> in
>>> >> >> >>> >>> the
>>> >> >> >>> >>> >> >> text - a fise:SettingAnnotation.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives some
>>> advices on
>>> >> >> how to
>>> >> >> >>> >>> model
>>> >> >> >>> >>> >> >> those. The important relation for modeling this
>>> >> >> Participation:
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >>     PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> where ..
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >>  * ED are Endurants (continuants): Endurants do have
>>> an
>>> >> >> >>> identity so
>>> >> >> >>> >>> we
>>> >> >> >>> >>> >> >> would typically refer to them as Entities referenced
>>> by a
>>> >> >> >>> setting.
>>> >> >> >>> >>> >> >> Note that this includes physical, non-physical as
>>> well as
>>> >> >> >>> >>> >> >> social-objects.
>>> >> >> >>> >>> >> >>  * PD are Perdurants (occurrents):  Perdurants are
>>> >> entities
>>> >> >> that
>>> >> >> >>> >>> >> >> happen in time. This refers to Events, Activities ...
>>> >> >> >>> >>> >> >>  * PC are Participation: It is an time indexed
>>> relation
>>> >> where
>>> >> >> >>> >>> >> >> Endurants participate in Perdurants
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> Modeling this in RDF requires to define some
>>> intermediate
>>> >> >> >>> resources
>>> >> >> >>> >>> >> >> because RDF does not allow for n-ary relations.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >>  * fise:SettingAnnotation: It is really handy to
>>> define
>>> >> one
>>> >> >> >>> resource
>>> >> >> >>> >>> >> >> being the context for all described data. I would
>>> call
>>> >> this
>>> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a
>>> sub-concept to
>>> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement about the
>>> >> extracted
>>> >> >> >>> >>> Setting
>>> >> >> >>> >>> >> >> would define a "fise:in-setting" relation to it.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >>  * fise:ParticipantAnnotation: Is used to annotate
>>> that
>>> >> >> >>> Endurant is
>>> >> >> >>> >>> >> >> participating on a setting (fise:in-setting
>>> >> >> >>> fise:SettingAnnotation).
>>> >> >> >>> >>> >> >> The Endurant itself is described by existing
>>> >> >> fise:TextAnnotaion
>>> >> >> >>> (the
>>> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation (suggested
>>> Entities).
>>> >> >> >>> Basically
>>> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow an
>>> >> >> EnhancementEngine
>>> >> >> >>> to
>>> >> >> >>> >>> >> >> state that several mentions (in possible different
>>> >> >> sentences) do
>>> >> >> >>> >>> >> >> represent the same Endurant as participating in the
>>> >> Setting.
>>> >> >> In
>>> >> >> >>> >>> >> >> addition it would be possible to use the dc:type
>>> property
>>> >> >> >>> (similar
>>> >> >> >>> >>> as
>>> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the role(s) of
>>> an
>>> >> >> >>> participant
>>> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally performs an
>>> action)
>>> >> Cause
>>> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), Patient (a
>>> passive
>>> >> role
>>> >> >> in
>>> >> >> >>> an
>>> >> >> >>> >>> >> >> activity) and Instrument (aids an process)), but I am
>>> >> >> wondering
>>> >> >> >>> if
>>> >> >> >>> >>> one
>>> >> >> >>> >>> >> >> could extract those information.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to annotate a
>>> >> Perdurant
>>> >> >> in
>>> >> >> >>> the
>>> >> >> >>> >>> >> >> context of the Setting. Also
>>> fise:OccurrentAnnotation can
>>> >> >> link
>>> >> >> >>> to
>>> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the text
>>> defining
>>> >> the
>>> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation
>>> suggesting
>>> >> well
>>> >> >> >>> known
>>> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election in a
>>> country,
>>> >> or
>>> >> >> an
>>> >> >> >>> >>> >> >> upraising ...). In addition fise:OccurrentAnnotation
>>> can
>>> >> >> define
>>> >> >> >>> >>> >> >> dc:has-participant links to
>>> fise:ParticipantAnnotation. In
>>> >> >> this
>>> >> >> >>> case
>>> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant (the
>>> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in this
>>> Perturant
>>> >> (the
>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences are
>>> temporal
>>> >> >> indexed
>>> >> >> >>> this
>>> >> >> >>> >>> >> >> annotation should also support properties for
>>> defining the
>>> >> >> >>> >>> >> >> xsd:dateTime for the start/end.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> Indeed, an event based data structure makes a lot of
>>> sense
>>> >> >> with
>>> >> >> >>> the
>>> >> >> >>> >>> >> remark
>>> >> >> >>> >>> >> > that you probably won't be able to always extract the
>>> date
>>> >> >> for a
>>> >> >> >>> >>> given
>>> >> >> >>> >>> >> > setting(situation).
>>> >> >> >>> >>> >> > There are 2 thing which are unclear though.
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >> > 1. Perdurant : You could have situations in which the
>>> >> object
>>> >> >> upon
>>> >> >> >>> >>> which
>>> >> >> >>> >>> >> the
>>> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a transitory
>>> >> object (
>>> >> >> >>> such
>>> >> >> >>> >>> as an
>>> >> >> >>> >>> >> > event, activity ) but rather another Endurant. For
>>> example
>>> >> we
>>> >> >> can
>>> >> >> >>> >>> have
>>> >> >> >>> >>> >> the
>>> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the Endurant
>>> (
>>> >> >> Subject )
>>> >> >> >>> >>> which
>>> >> >> >>> >>> >> > performs the action of "invading" on another
>>> Eundurant,
>>> >> namely
>>> >> >> >>> >>> "Irak".
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq the
>>> Patient.
>>> >> Both
>>> >> >> >>> are
>>> >> >> >>> >>> >> Endurants. The activity "invading" would be the
>>> Perdurant. So
>>> >> >> >>> ideally
>>> >> >> >>> >>> >> you would have a  "fise:SettingAnnotation" with:
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for USA with the dc:type
>>> >> >> caos:Agent,
>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" and a
>>> >> >> >>> fise:EntityAnnotation
>>> >> >> >>> >>> >> linking to dbpedia:United_States
>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for Iraq with the dc:type
>>> >> >> >>> caos:Patient,
>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" and a
>>> >> >> >>> >>> >> fise:EntityAnnotation linking to  dbpedia:Iraq
>>> >> >> >>> >>> >>   * fise:OccurrentAnnotation for "invades" with the
>>> dc:type
>>> >> >> >>> >>> >> caos:Activity, linking to a fise:TextAnnotation for
>>> "invades"
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> > 2. Where does the verb, which links the Subject and
>>> the
>>> >> Object
>>> >> >> >>> come
>>> >> >> >>> >>> into
>>> >> >> >>> >>> >> > this? I imagined that the Endurant would have a
>>> >> dc:"property"
>>> >> >> >>> where
>>> >> >> >>> >>> the
>>> >> >> >>> >>> >> > property = verb which links to the Object in noun
>>> form. For
>>> >> >> >>> example
>>> >> >> >>> >>> take
>>> >> >> >>> >>> >> > again the sentence "USA invades Irak". You would have
>>> the
>>> >> >> "USA"
>>> >> >> >>> >>> Entity
>>> >> >> >>> >>> >> with
>>> >> >> >>> >>> >> > dc:invader which points to the Object "Irak". The
>>> Endurant
>>> >> >> would
>>> >> >> >>> >>> have as
>>> >> >> >>> >>> >> > many dc:"property" elements as there are verbs which
>>> link
>>> >> it
>>> >> >> to
>>> >> >> >>> an
>>> >> >> >>> >>> >> Object.
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> As explained above you would have a
>>> fise:OccurrentAnnotation
>>> >> >> that
>>> >> >> >>> >>> >> represents the Perdurant. The information that the
>>> activity
>>> >> >> >>> mention in
>>> >> >> >>> >>> >> the text is "invades" would be by linking to a
>>> >> >> >>> fise:TextAnnotation. If
>>> >> >> >>> >>> >> you can also provide an Ontology for Tasks that defines
>>> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation could
>>> also link
>>> >> >> to an
>>> >> >> >>> >>> >> fise:EntityAnnotation for this concept.
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> best
>>> >> >> >>> >>> >> Rupert
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >> > ### Consuming the data:
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> I think this model should be sufficient for
>>> use-cases as
>>> >> >> >>> described
>>> >> >> >>> >>> by
>>> >> >> >>> >>> >> you.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> Users would be able to consume data on the setting
>>> level.
>>> >> >> This
>>> >> >> >>> can
>>> >> >> >>> >>> be
>>> >> >> >>> >>> >> >> done my simple retrieving all
>>> fise:ParticipantAnnotation
>>> >> as
>>> >> >> >>> well as
>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a setting. BTW
>>> this
>>> >> was
>>> >> >> the
>>> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic search. It
>>> allows
>>> >> >> >>> queries for
>>> >> >> >>> >>> >> >> Settings that involve specific Entities e.g. you
>>> could
>>> >> filter
>>> >> >> >>> for
>>> >> >> >>> >>> >> >> Settings that involve a {Person},
>>> activities:Arrested and
>>> >> a
>>> >> >> >>> specific
>>> >> >> >>> >>> >> >> {Upraising}. However note that with this approach
>>> you will
>>> >> >> get
>>> >> >> >>> >>> results
>>> >> >> >>> >>> >> >> for Setting where the {Person} participated and an
>>> other
>>> >> >> person
>>> >> >> >>> was
>>> >> >> >>> >>> >> >> arrested.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> An other possibility would be to process enhancement
>>> >> results
>>> >> >> on
>>> >> >> >>> the
>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow to a much
>>> >> higher
>>> >> >> >>> >>> >> >> granularity level (e.g. it would allow to correctly
>>> answer
>>> >> >> the
>>> >> >> >>> query
>>> >> >> >>> >>> >> >> used as an example above). But I am wondering if the
>>> >> quality
>>> >> >> of
>>> >> >> >>> the
>>> >> >> >>> >>> >> >> Setting extraction will be sufficient for this. I
>>> have
>>> >> also
>>> >> >> >>> doubts
>>> >> >> >>> >>> if
>>> >> >> >>> >>> >> >> this can be still realized by using semantic
>>> indexing to
>>> >> >> Apache
>>> >> >> >>> Solr
>>> >> >> >>> >>> >> >> or if it would be better/necessary to store results
>>> in a
>>> >> >> >>> TripleStore
>>> >> >> >>> >>> >> >> and using SPARQL for retrieval.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> The methodology and query language used by YAGO [3]
>>> is
>>> >> also
>>> >> >> very
>>> >> >> >>> >>> >> >> relevant for this (especially note chapter 7 SPOTL(X)
>>> >> >> >>> >>> Representation).
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> An other related Topic is the enrichment of Entities
>>> >> >> (especially
>>> >> >> >>> >>> >> >> Events) in knowledge bases based on Settings
>>> extracted
>>> >> form
>>> >> >> >>> >>> Documents.
>>> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants are
>>> temporal
>>> >> >> indexed.
>>> >> >> >>> That
>>> >> >> >>> >>> >> >> means that at the time when added to a knowledge
>>> base they
>>> >> >> might
>>> >> >> >>> >>> still
>>> >> >> >>> >>> >> >> be in process. So the creation, enriching and
>>> refinement
>>> >> of
>>> >> >> such
>>> >> >> >>> >>> >> >> Entities in a the knowledge base seams to be
>>> critical for
>>> >> a
>>> >> >> >>> System
>>> >> >> >>> >>> >> >> like described in your use-case.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian Petroaca
>>> >> >> >>> >>> >> >> <cristian.petro...@gmail.com> wrote:
>>> >> >> >>> >>> >> >> >
>>> >> >> >>> >>> >> >> > First of all I have to mention that I am new in the
>>> >> field
>>> >> >> of
>>> >> >> >>> >>> semantic
>>> >> >> >>> >>> >> >> > technologies, I've started to read about them in
>>> the
>>> >> last
>>> >> >> 4-5
>>> >> >> >>> >>> >> >> months.Having
>>> >> >> >>> >>> >> >> > said that I have a high level overview of what is
>>> a good
>>> >> >> >>> approach
>>> >> >> >>> >>> to
>>> >> >> >>> >>> >> >> solve
>>> >> >> >>> >>> >> >> > this problem. There are a number of papers on the
>>> >> internet
>>> >> >> >>> which
>>> >> >> >>> >>> >> describe
>>> >> >> >>> >>> >> >> > what steps need to be taken such as : named entity
>>> >> >> >>> recognition,
>>> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and others.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> The Stanbol NLP processing module currently only
>>> supports
>>> >> >> >>> sentence
>>> >> >> >>> >>> >> >> detection, tokenization, POS tagging, Chunking, NER
>>> and
>>> >> >> lemma.
>>> >> >> >>> >>> support
>>> >> >> >>> >>> >> >> for co-reference resolution and dependency trees is
>>> >> currently
>>> >> >> >>> >>> missing.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> Stanford NLP is already integrated with Stanbol [4].
>>> At
>>> >> the
>>> >> >> >>> moment
>>> >> >> >>> >>> it
>>> >> >> >>> >>> >> >> only supports English, but I do already work to
>>> include
>>> >> the
>>> >> >> >>> other
>>> >> >> >>> >>> >> >> supported languages. Other NLP framework that is
>>> already
>>> >> >> >>> integrated
>>> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane [6]. But
>>> note
>>> >> >> that
>>> >> >> >>> for
>>> >> >> >>> >>> all
>>> >> >> >>> >>> >> >> those the integration excludes support for
>>> co-reference
>>> >> and
>>> >> >> >>> >>> dependency
>>> >> >> >>> >>> >> >> trees.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> Anyways I am confident that one can implement a first
>>> >> >> prototype
>>> >> >> >>> by
>>> >> >> >>> >>> >> >> only using Sentences and POS tags and - if available
>>> -
>>> >> Chunks
>>> >> >> >>> (e.g.
>>> >> >> >>> >>> >> >> Noun phrases).
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> > I assume that in the Stanbol context, a feature like
>>> >> Relation
>>> >> >> >>> >>> extraction
>>> >> >> >>> >>> >> > would be implemented as an EnhancementEngine?
>>> >> >> >>> >>> >> > What kind of effort would be required for a
>>> co-reference
>>> >> >> >>> resolution
>>> >> >> >>> >>> tool
>>> >> >> >>> >>> >> > integration into Stanbol?
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> Yes in the end it would be an EnhancementEngine. But
>>> before
>>> >> we
>>> >> >> can
>>> >> >> >>> >>> >> build such an engine we would need to
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> * extend the Stanbol NLP processing API with
>>> Annotations for
>>> >> >> >>> >>> co-reference
>>> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing for those
>>> >> >> annotation
>>> >> >> >>> so
>>> >> >> >>> >>> >> that the RESTful NLP Analysis Service can provide
>>> >> co-reference
>>> >> >> >>> >>> >> information
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> > At this moment I'll be focusing on 2 aspects:
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >> > 1. Determine the best data structure to encapsulate
>>> the
>>> >> >> extracted
>>> >> >> >>> >>> >> > information. I'll take a closer look at Dolce.
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> Don't make to to complex. Defining a proper structure to
>>> >> >> represent
>>> >> >> >>> >>> >> Events will only pay-off if we can also successfully
>>> extract
>>> >> >> such
>>> >> >> >>> >>> >> information form processed texts.
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> I would start with
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >>  * fise:SettingAnnotation
>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >>  * fise:ParticipantAnnotation
>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>>> >> >> >>> >>> >>     * fise:suggestion {entityAnnotation} (multiple if
>>> there
>>> >> are
>>> >> >> >>> more
>>> >> >> >>> >>> >> suggestions)
>>> >> >> >>> >>> >>     * dc:type one of fise:Agent, fise:Patient,
>>> >> fise:Instrument,
>>> >> >> >>> >>> fise:Cause
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >>  * fise:OccurrentAnnotation
>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>>> >> >> >>> >>> >>     * dc:type set to fise:Activity
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> If it turns out that we can extract more, we can add
>>> more
>>> >> >> >>> structure to
>>> >> >> >>> >>> >> those annotations. We might also think about using an
>>> own
>>> >> >> namespace
>>> >> >> >>> >>> >> for those extensions to the annotation structure.
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> > 2. Determine how should all of this be integrated into
>>> >> >> Stanbol.
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> Just create an EventExtractionEngine and configure a
>>> >> enhancement
>>> >> >> >>> chain
>>> >> >> >>> >>> >> that does NLP processing and EntityLinking.
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> You should have a look at
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does a lot of
>>> things
>>> >> >> with
>>> >> >> >>> NLP
>>> >> >> >>> >>> >> processing results (e.g. connecting adjectives (via
>>> verbs) to
>>> >> >> >>> >>> >> nouns/pronouns. So as long we can not use explicit
>>> dependency
>>> >> >> trees
>>> >> >> >>> >>> >> you code will need to do similar things with Nouns,
>>> Pronouns
>>> >> and
>>> >> >> >>> >>> >> Verbs.
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a Java
>>> >> >> representation
>>> >> >> >>> of
>>> >> >> >>> >>> >> present fise:TextAnnotation and fise:EntityAnnotation
>>> [2].
>>> >> >> >>> Something
>>> >> >> >>> >>> >> similar will also be required by the
>>> EventExtractionEngine
>>> >> for
>>> >> >> fast
>>> >> >> >>> >>> >> access to such annotations while iterating over the
>>> >> Sentences of
>>> >> >> >>> the
>>> >> >> >>> >>> >> text.
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> best
>>> >> >> >>> >>> >> Rupert
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> [1]
>>> >> >> >>> >>> >>
>>> >> >> >>> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java
>>> >> >> >>> >>> >> [2]
>>> >> >> >>> >>> >>
>>> >> >> >>> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >> > Thanks
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion
>>> >> >> >>> >>> >> >> best
>>> >> >> >>> >>> >> >> Rupert
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> --
>>> >> >> >>> >>> >> >> | Rupert Westenthaler
>>> >> >> rupert.westentha...@gmail.com
>>> >> >> >>> >>> >> >> | Bodenlehenstraße 11
>>> >> >> >>> ++43-699-11108907
>>> >> >> >>> >>> >> >> | A-5500 Bischofshofen
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> --
>>> >> >> >>> >>> >> | Rupert Westenthaler
>>> >> rupert.westentha...@gmail.com
>>> >> >> >>> >>> >> | Bodenlehenstraße 11
>>> >> >> >>> ++43-699-11108907
>>> >> >> >>> >>> >> | A-5500 Bischofshofen
>>> >> >> >>> >>> >>
>>> >> >> >>> >>>
>>> >> >> >>> >>>
>>> >> >> >>> >>>
>>> >> >> >>> >>> --
>>> >> >> >>> >>> | Rupert Westenthaler
>>> rupert.westentha...@gmail.com
>>> >> >> >>> >>> | Bodenlehenstraße 11
>>> >> >> ++43-699-11108907
>>> >> >> >>> >>> | A-5500 Bischofshofen
>>> >> >> >>> >>>
>>> >> >> >>> >>
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >> >>> --
>>> >> >> >>> | Rupert Westenthaler
>>> rupert.westentha...@gmail.com
>>> >> >> >>> | Bodenlehenstraße 11
>>> ++43-699-11108907
>>> >> >> >>> | A-5500 Bischofshofen
>>> >> >> >>>
>>> >> >> >>
>>> >> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> | Rupert Westenthaler             rupert.westentha...@gmail.com
>>> >> >> | Bodenlehenstraße 11
>>> ++43-699-11108907
>>> >> >> | A-5500 Bischofshofen
>>> >> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> | Rupert Westenthaler             rupert.westentha...@gmail.com
>>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>>> >> | A-5500 Bischofshofen
>>> >>
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             rupert.westentha...@gmail.com
>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> | A-5500 Bischofshofen
>>>
>>
>>
>

Re: Relation extraction feature

Reply via email to