Re: Relation extraction feature

Cristian Petroaca Fri, 30 Aug 2013 12:29:08 -0700

Hi Rupert,

Ok, so after looking at the JSON output from the Stanford NLP Server and
the coref module I'm thinking I can represent the coreference information
this way:
Each "Token" or "Chunk" will contain an additional coref annotation with
the following structure :


"stanbol.enhancer.nlp.coref" {
    "tag" : //does this need to exist?
    "isRepresentative" : true/false, // whether this token or chunk is the
representative mention in the chain
    "mentions" : [ { "sentenceNo" : 1 //the sentence in which the mention
is found
                           "startWord" : 2 //the first word making up the
mention
                           "endWord" : 3 //the last word making up the
mention
                         }, ...
                       ],
    "class" : ""class" : "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
}

The CorefTag should resemble this model.

What do you think?

Cristian


2013/8/24 Rupert Westenthaler <rupert.westentha...@gmail.com>

> Hi Cristian,
>
> you can not directly call StanfordNLP components from Stanbol, but you
> have to extend the RESTful service to include the information you
> need. The main reason for that is that the license of StanfordNLP is
> not compatible with the Apache Software License. So Stanbol can not
> directly link to the StanfordNLP API.
>
> You will need to
>
> 1. define an additional class {yourTag} extends Tag<{yourType}> class
> in the o.a.s.enhancer.nlp module
> 2. add JSON parsing and serialization support for this tag to the
> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an example)
>
> As (1) would be necessary anyway the only additional thing you need to
> develop is (2). After that you can add {yourTag} instance to the
> AnalyzedText in the StanfornNLP integration. The
> RestfulNlpAnalysisEngine will parse them from the response. All
> engines executed after the RestfulNlpAnalysisEngine will have access
> to your annotations.
>
> If you have a design for {yourTag} - the model you would like to use
> to represent your data - I can help with (1) and (2).
>
> best
> Rupert
>
>
> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca
> <cristian.petro...@gmail.com> wrote:
> > Hi Rupert,
> >
> > Thanks for the info. Looking at the standbol-stanfordnlp project I see
> that
> > the stanford nlp is not implemented as an EnhancementEngine but rather it
> > is used directly in a Jetty Server instance. How does that fit into the
> > Stanbol stack? For example how can I call the StanfordNlpAnalyzer's
> routine
> > from my TripleExtractionEnhancementEngine which lives in the Stanbol
> stack?
> >
> > Thanks,
> > Cristian
> >
> >
> > 2013/8/12 Rupert Westenthaler <rupert.westentha...@gmail.com>
> >
> >> Hi Cristian,
> >>
> >> Sorry for the late response, but I was offline for the last two weeks
> >>
> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca
> >> <cristian.petro...@gmail.com> wrote:
> >> > Hi Rupert,
> >> >
> >> > After doing some tests it seems that the Stanford NLP coreference
> module
> >> is
> >> > much more accurate than the Open NLP one.So I decided to extend
> Stanford
> >> > NLP to add coreference there.
> >>
> >> The Stanford NLP integration is not part of the Stanbol codebase
> >> because the licenses are not compatible.
> >>
> >> You can find the Stanford NLP integration on
> >>
> >>     https://github.com/westei/stanbol-stanfordnlp
> >>
> >> just create a fork and send pull requests.
> >>
> >>
> >> > Could you add the necessary projects on the branch? And also remove
> the
> >> > Open NLP ones?
> >> >
> >>
> >> Currently the branch
> >>
> >>
> >>
> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
> >>
> >> only contains the "nlp" and the "nlp-json" modules. IMO those should
> >> be enough for adding coreference support.
> >>
> >> IMO you will need to
> >>
> >> * add an model for representing coreference to the nlp module
> >> * add parsing and serializing support to the nlp-json module
> >> * add the implementation to your fork of the stanbol-stanfordnlp project
> >>
> >> best
> >> Rupert
> >>
> >>
> >>
> >> > Thanks,
> >> > Cristian
> >> >
> >> >
> >> > 2013/7/5 Rupert Westenthaler <rupert.westentha...@gmail.com>
> >> >
> >> >> Hi Cristian,
> >> >>
> >> >> I created the branch at
> >> >>
> >> >>
> >> >>
> >>
> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
> >> >>
> >> >> ATM in contains only the "nlp" and "nlp-json" module. Let me know if
> >> >> you would like to have more
> >> >>
> >> >> best
> >> >> Rupert
> >> >>
> >> >>
> >> >>
> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca
> >> >> <cristian.petro...@gmail.com> wrote:
> >> >> > Hi Rupert,
> >> >> >
> >> >> > I created jiras :
> https://issues.apache.org/jira/browse/STANBOL-1132and
> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. The original
> one
> >> in
> >> >> > dependent upon these.
> >> >> > Please let me know when I can start using the branch.
> >> >> >
> >> >> > Thanks,
> >> >> > Cristian
> >> >> >
> >> >> >
> >> >> > 2013/6/27 Cristian Petroaca <cristian.petro...@gmail.com>
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> 2013/6/27 Rupert Westenthaler <rupert.westentha...@gmail.com>
> >> >> >>
> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca
> >> >> >>> <cristian.petro...@gmail.com> wrote:
> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford in my previous
> >> >> e-mail.
> >> >> >>> By
> >> >> >>> > the way, does Open NLP have the ability to build dependency
> trees?
> >> >> >>> >
> >> >> >>>
> >> >> >>> AFAIK OpenNLP does not provide this feature.
> >> >> >>>
> >> >> >>
> >> >> >> Then , since the Stanford NLP lib is also integrated into Stanbol,
> >> I'll
> >> >> >> take a look at how I can extend its integration to include the
> >> >> dependency
> >> >> >> tree feature.
> >> >> >>
> >> >> >>>
> >> >> >>>
> >> >> >>  >
> >> >> >>> > 2013/6/23 Cristian Petroaca <cristian.petro...@gmail.com>
> >> >> >>> >
> >> >> >>> >> Hi Rupert,
> >> >> >>> >>
> >> >> >>> >> I created jira
> >> https://issues.apache.org/jira/browse/STANBOL-1121.
> >> >> >>> >> As you suggested I would start with extending the Stanford NLP
> >> with
> >> >> >>> >> co-reference resolution but I think also with dependency trees
> >> >> because
> >> >> >>> I
> >> >> >>> >> also need to know the Subject of the sentence and the object
> >> that it
> >> >> >>> >> affects, right?
> >> >> >>> >>
> >> >> >>> >> Given that I need to extend the Stanford NLP API in Stanbol
> for
> >> >> >>> >> co-reference and dependency trees, how do I proceed with this?
> >> Do I
> >> >> >>> create
> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After that can I
> >> start
> >> >> >>> >> implementing on my local copy of Stanbol and when I'm done
> I'll
> >> send
> >> >> >>> you
> >> >> >>> >> guys the patch fo review?
> >> >> >>> >>
> >> >> >>>
> >> >> >>> I would create two "New Feature" type Issues one for adding
> support
> >> >> >>> for "dependency trees" and the other for "co-reference" support.
> You
> >> >> >>> should also define "depends on" relations between STANBOL-1121
> and
> >> >> >>> those two new issues.
> >> >> >>>
> >> >> >>> Sub-task could also work, but as adding those features would be
> also
> >> >> >>> interesting for other things I would rather define them as
> separate
> >> >> >>> issues.
> >> >> >>>
> >> >> >>>
> >> >> >> 2 New Features connected with the original jira it is then.
> >> >> >>
> >> >> >>
> >> >> >>> If you would prefer to work in an own branch please tell me. This
> >> >> >>> could have the advantage that patches would not be affected by
> >> changes
> >> >> >>> in the trunk.
> >> >> >>>
> >> >> >>> Yes, a separate branch sounds good.
> >> >> >>
> >> >> >> best
> >> >> >>> Rupert
> >> >> >>>
> >> >> >>> >> Regards,
> >> >> >>> >> Cristian
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> 2013/6/18 Rupert Westenthaler <rupert.westentha...@gmail.com>
> >> >> >>> >>
> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca
> >> >> >>> >>> <cristian.petro...@gmail.com> wrote:
> >> >> >>> >>> > Hi Rupert,
> >> >> >>> >>> >
> >> >> >>> >>> > Agreed on the
> >> >> >>> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation
> >> >> >>> >>> > data structure.
> >> >> >>> >>> >
> >> >> >>> >>> > Should I open up a Jira for all of this in order to
> >> encapsulate
> >> >> this
> >> >> >>> >>> > information and establish the goals and these initial steps
> >> >> towards
> >> >> >>> >>> these
> >> >> >>> >>> > goals?
> >> >> >>> >>>
> >> >> >>> >>> Yes please. A JIRA issue for this work would be great.
> >> >> >>> >>>
> >> >> >>> >>> > How should I proceed further? Should I create some design
> >> >> documents
> >> >> >>> that
> >> >> >>> >>> > need to be reviewed?
> >> >> >>> >>>
> >> >> >>> >>> Usually it is the best to write design related text directly
> in
> >> >> JIRA
> >> >> >>> >>> by using Markdown [1] syntax. This will allow us later to use
> >> this
> >> >> >>> >>> text directly for the documentation on the Stanbol Webpage.
> >> >> >>> >>>
> >> >> >>> >>> best
> >> >> >>> >>> Rupert
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/
> >> >> >>> >>> >
> >> >> >>> >>> > Regards,
> >> >> >>> >>> > Cristian
> >> >> >>> >>> >
> >> >> >>> >>> >
> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler <
> rupert.westentha...@gmail.com>
> >> >> >>> >>> >
> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian Petroaca
> >> >> >>> >>> >> <cristian.petro...@gmail.com> wrote:
> >> >> >>> >>> >> > HI Rupert,
> >> >> >>> >>> >> >
> >> >> >>> >>> >> > First of all thanks for the detailed suggestions.
> >> >> >>> >>> >> >
> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler <
> >> rupert.westentha...@gmail.com>
> >> >> >>> >>> >> >
> >> >> >>> >>> >> >> Hi Cristian, all
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >> really interesting use case!
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >> In this mail I will try to give some suggestions on how
> >> this
> >> >> >>> could
> >> >> >>> >>> >> >> work out. This suggestions are mainly based on
> experiences
> >> >> and
> >> >> >>> >>> lessons
> >> >> >>> >>> >> >> learned in the LIVE [2] project where we built an
> >> information
> >> >> >>> system
> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this Project
> >> excluded
> >> >> the
> >> >> >>> >>> >> >> extraction of Events from unstructured text (because
> the
> >> >> Olympic
> >> >> >>> >>> >> >> Information System was already providing event data as
> XML
> >> >> >>> messages)
> >> >> >>> >>> >> >> the semantic search capabilities of this system where
> very
> >> >> >>> similar
> >> >> >>> >>> as
> >> >> >>> >>> >> >> the one described by your use case.
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >> IMHO you are not only trying to extract relations, but
> a
> >> >> formal
> >> >> >>> >>> >> >> representation of the situation described by the text.
> So
> >> >> lets
> >> >> >>> >>> assume
> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or Situation)
> >> >> described
> >> >> >>> in
> >> >> >>> >>> the
> >> >> >>> >>> >> >> text - a fise:SettingAnnotation.
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives some advices
> on
> >> >> how to
> >> >> >>> >>> model
> >> >> >>> >>> >> >> those. The important relation for modeling this
> >> >> Participation:
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >>     PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >> where ..
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >>  * ED are Endurants (continuants): Endurants do have an
> >> >> >>> identity so
> >> >> >>> >>> we
> >> >> >>> >>> >> >> would typically refer to them as Entities referenced
> by a
> >> >> >>> setting.
> >> >> >>> >>> >> >> Note that this includes physical, non-physical as well
> as
> >> >> >>> >>> >> >> social-objects.
> >> >> >>> >>> >> >>  * PD are Perdurants (occurrents):  Perdurants are
> >> entities
> >> >> that
> >> >> >>> >>> >> >> happen in time. This refers to Events, Activities ...
> >> >> >>> >>> >> >>  * PC are Participation: It is an time indexed relation
> >> where
> >> >> >>> >>> >> >> Endurants participate in Perdurants
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >> Modeling this in RDF requires to define some
> intermediate
> >> >> >>> resources
> >> >> >>> >>> >> >> because RDF does not allow for n-ary relations.
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >>  * fise:SettingAnnotation: It is really handy to define
> >> one
> >> >> >>> resource
> >> >> >>> >>> >> >> being the context for all described data. I would call
> >> this
> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a
> sub-concept to
> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement about the
> >> extracted
> >> >> >>> >>> Setting
> >> >> >>> >>> >> >> would define a "fise:in-setting" relation to it.
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >>  * fise:ParticipantAnnotation: Is used to annotate that
> >> >> >>> Endurant is
> >> >> >>> >>> >> >> participating on a setting (fise:in-setting
> >> >> >>> fise:SettingAnnotation).
> >> >> >>> >>> >> >> The Endurant itself is described by existing
> >> >> fise:TextAnnotaion
> >> >> >>> (the
> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation (suggested
> Entities).
> >> >> >>> Basically
> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow an
> >> >> EnhancementEngine
> >> >> >>> to
> >> >> >>> >>> >> >> state that several mentions (in possible different
> >> >> sentences) do
> >> >> >>> >>> >> >> represent the same Endurant as participating in the
> >> Setting.
> >> >> In
> >> >> >>> >>> >> >> addition it would be possible to use the dc:type
> property
> >> >> >>> (similar
> >> >> >>> >>> as
> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the role(s) of an
> >> >> >>> participant
> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally performs an action)
> >> Cause
> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), Patient (a passive
> >> role
> >> >> in
> >> >> >>> an
> >> >> >>> >>> >> >> activity) and Instrument (aids an process)), but I am
> >> >> wondering
> >> >> >>> if
> >> >> >>> >>> one
> >> >> >>> >>> >> >> could extract those information.
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to annotate a
> >> Perdurant
> >> >> in
> >> >> >>> the
> >> >> >>> >>> >> >> context of the Setting. Also fise:OccurrentAnnotation
> can
> >> >> link
> >> >> >>> to
> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the text
> defining
> >> the
> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation suggesting
> >> well
> >> >> >>> known
> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election in a
> country,
> >> or
> >> >> an
> >> >> >>> >>> >> >> upraising ...). In addition fise:OccurrentAnnotation
> can
> >> >> define
> >> >> >>> >>> >> >> dc:has-participant links to
> fise:ParticipantAnnotation. In
> >> >> this
> >> >> >>> case
> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant (the
> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in this Perturant
> >> (the
> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences are temporal
> >> >> indexed
> >> >> >>> this
> >> >> >>> >>> >> >> annotation should also support properties for defining
> the
> >> >> >>> >>> >> >> xsd:dateTime for the start/end.
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >> Indeed, an event based data structure makes a lot of
> sense
> >> >> with
> >> >> >>> the
> >> >> >>> >>> >> remark
> >> >> >>> >>> >> > that you probably won't be able to always extract the
> date
> >> >> for a
> >> >> >>> >>> given
> >> >> >>> >>> >> > setting(situation).
> >> >> >>> >>> >> > There are 2 thing which are unclear though.
> >> >> >>> >>> >> >
> >> >> >>> >>> >> > 1. Perdurant : You could have situations in which the
> >> object
> >> >> upon
> >> >> >>> >>> which
> >> >> >>> >>> >> the
> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a transitory
> >> object (
> >> >> >>> such
> >> >> >>> >>> as an
> >> >> >>> >>> >> > event, activity ) but rather another Endurant. For
> example
> >> we
> >> >> can
> >> >> >>> >>> have
> >> >> >>> >>> >> the
> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the Endurant (
> >> >> Subject )
> >> >> >>> >>> which
> >> >> >>> >>> >> > performs the action of "invading" on another Eundurant,
> >> namely
> >> >> >>> >>> "Irak".
> >> >> >>> >>> >> >
> >> >> >>> >>> >>
> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq the
> Patient.
> >> Both
> >> >> >>> are
> >> >> >>> >>> >> Endurants. The activity "invading" would be the
> Perdurant. So
> >> >> >>> ideally
> >> >> >>> >>> >> you would have a  "fise:SettingAnnotation" with:
> >> >> >>> >>> >>
> >> >> >>> >>> >>   * fise:ParticipantAnnotation for USA with the dc:type
> >> >> caos:Agent,
> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" and a
> >> >> >>> fise:EntityAnnotation
> >> >> >>> >>> >> linking to dbpedia:United_States
> >> >> >>> >>> >>   * fise:ParticipantAnnotation for Iraq with the dc:type
> >> >> >>> caos:Patient,
> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" and a
> >> >> >>> >>> >> fise:EntityAnnotation linking to  dbpedia:Iraq
> >> >> >>> >>> >>   * fise:OccurrentAnnotation for "invades" with the
> dc:type
> >> >> >>> >>> >> caos:Activity, linking to a fise:TextAnnotation for
> "invades"
> >> >> >>> >>> >>
> >> >> >>> >>> >> > 2. Where does the verb, which links the Subject and the
> >> Object
> >> >> >>> come
> >> >> >>> >>> into
> >> >> >>> >>> >> > this? I imagined that the Endurant would have a
> >> dc:"property"
> >> >> >>> where
> >> >> >>> >>> the
> >> >> >>> >>> >> > property = verb which links to the Object in noun form.
> For
> >> >> >>> example
> >> >> >>> >>> take
> >> >> >>> >>> >> > again the sentence "USA invades Irak". You would have
> the
> >> >> "USA"
> >> >> >>> >>> Entity
> >> >> >>> >>> >> with
> >> >> >>> >>> >> > dc:invader which points to the Object "Irak". The
> Endurant
> >> >> would
> >> >> >>> >>> have as
> >> >> >>> >>> >> > many dc:"property" elements as there are verbs which
> link
> >> it
> >> >> to
> >> >> >>> an
> >> >> >>> >>> >> Object.
> >> >> >>> >>> >>
> >> >> >>> >>> >> As explained above you would have a
> fise:OccurrentAnnotation
> >> >> that
> >> >> >>> >>> >> represents the Perdurant. The information that the
> activity
> >> >> >>> mention in
> >> >> >>> >>> >> the text is "invades" would be by linking to a
> >> >> >>> fise:TextAnnotation. If
> >> >> >>> >>> >> you can also provide an Ontology for Tasks that defines
> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation could also
> link
> >> >> to an
> >> >> >>> >>> >> fise:EntityAnnotation for this concept.
> >> >> >>> >>> >>
> >> >> >>> >>> >> best
> >> >> >>> >>> >> Rupert
> >> >> >>> >>> >>
> >> >> >>> >>> >> >
> >> >> >>> >>> >> > ### Consuming the data:
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >> I think this model should be sufficient for use-cases
> as
> >> >> >>> described
> >> >> >>> >>> by
> >> >> >>> >>> >> you.
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >> Users would be able to consume data on the setting
> level.
> >> >> This
> >> >> >>> can
> >> >> >>> >>> be
> >> >> >>> >>> >> >> done my simple retrieving all
> fise:ParticipantAnnotation
> >> as
> >> >> >>> well as
> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a setting. BTW
> this
> >> was
> >> >> the
> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic search. It
> allows
> >> >> >>> queries for
> >> >> >>> >>> >> >> Settings that involve specific Entities e.g. you could
> >> filter
> >> >> >>> for
> >> >> >>> >>> >> >> Settings that involve a {Person}, activities:Arrested
> and
> >> a
> >> >> >>> specific
> >> >> >>> >>> >> >> {Upraising}. However note that with this approach you
> will
> >> >> get
> >> >> >>> >>> results
> >> >> >>> >>> >> >> for Setting where the {Person} participated and an
> other
> >> >> person
> >> >> >>> was
> >> >> >>> >>> >> >> arrested.
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >> An other possibility would be to process enhancement
> >> results
> >> >> on
> >> >> >>> the
> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow to a much
> >> higher
> >> >> >>> >>> >> >> granularity level (e.g. it would allow to correctly
> answer
> >> >> the
> >> >> >>> query
> >> >> >>> >>> >> >> used as an example above). But I am wondering if the
> >> quality
> >> >> of
> >> >> >>> the
> >> >> >>> >>> >> >> Setting extraction will be sufficient for this. I have
> >> also
> >> >> >>> doubts
> >> >> >>> >>> if
> >> >> >>> >>> >> >> this can be still realized by using semantic indexing
> to
> >> >> Apache
> >> >> >>> Solr
> >> >> >>> >>> >> >> or if it would be better/necessary to store results in
> a
> >> >> >>> TripleStore
> >> >> >>> >>> >> >> and using SPARQL for retrieval.
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >> The methodology and query language used by YAGO [3] is
> >> also
> >> >> very
> >> >> >>> >>> >> >> relevant for this (especially note chapter 7 SPOTL(X)
> >> >> >>> >>> Representation).
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >> An other related Topic is the enrichment of Entities
> >> >> (especially
> >> >> >>> >>> >> >> Events) in knowledge bases based on Settings extracted
> >> form
> >> >> >>> >>> Documents.
> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants are temporal
> >> >> indexed.
> >> >> >>> That
> >> >> >>> >>> >> >> means that at the time when added to a knowledge base
> they
> >> >> might
> >> >> >>> >>> still
> >> >> >>> >>> >> >> be in process. So the creation, enriching and
> refinement
> >> of
> >> >> such
> >> >> >>> >>> >> >> Entities in a the knowledge base seams to be critical
> for
> >> a
> >> >> >>> System
> >> >> >>> >>> >> >> like described in your use-case.
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian Petroaca
> >> >> >>> >>> >> >> <cristian.petro...@gmail.com> wrote:
> >> >> >>> >>> >> >> >
> >> >> >>> >>> >> >> > First of all I have to mention that I am new in the
> >> field
> >> >> of
> >> >> >>> >>> semantic
> >> >> >>> >>> >> >> > technologies, I've started to read about them in the
> >> last
> >> >> 4-5
> >> >> >>> >>> >> >> months.Having
> >> >> >>> >>> >> >> > said that I have a high level overview of what is a
> good
> >> >> >>> approach
> >> >> >>> >>> to
> >> >> >>> >>> >> >> solve
> >> >> >>> >>> >> >> > this problem. There are a number of papers on the
> >> internet
> >> >> >>> which
> >> >> >>> >>> >> describe
> >> >> >>> >>> >> >> > what steps need to be taken such as : named entity
> >> >> >>> recognition,
> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and others.
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >> The Stanbol NLP processing module currently only
> supports
> >> >> >>> sentence
> >> >> >>> >>> >> >> detection, tokenization, POS tagging, Chunking, NER and
> >> >> lemma.
> >> >> >>> >>> support
> >> >> >>> >>> >> >> for co-reference resolution and dependency trees is
> >> currently
> >> >> >>> >>> missing.
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >> Stanford NLP is already integrated with Stanbol [4]. At
> >> the
> >> >> >>> moment
> >> >> >>> >>> it
> >> >> >>> >>> >> >> only supports English, but I do already work to include
> >> the
> >> >> >>> other
> >> >> >>> >>> >> >> supported languages. Other NLP framework that is
> already
> >> >> >>> integrated
> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane [6]. But
> note
> >> >> that
> >> >> >>> for
> >> >> >>> >>> all
> >> >> >>> >>> >> >> those the integration excludes support for co-reference
> >> and
> >> >> >>> >>> dependency
> >> >> >>> >>> >> >> trees.
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >> Anyways I am confident that one can implement a first
> >> >> prototype
> >> >> >>> by
> >> >> >>> >>> >> >> only using Sentences and POS tags and - if available -
> >> Chunks
> >> >> >>> (e.g.
> >> >> >>> >>> >> >> Noun phrases).
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> > I assume that in the Stanbol context, a feature like
> >> Relation
> >> >> >>> >>> extraction
> >> >> >>> >>> >> > would be implemented as an EnhancementEngine?
> >> >> >>> >>> >> > What kind of effort would be required for a co-reference
> >> >> >>> resolution
> >> >> >>> >>> tool
> >> >> >>> >>> >> > integration into Stanbol?
> >> >> >>> >>> >> >
> >> >> >>> >>> >>
> >> >> >>> >>> >> Yes in the end it would be an EnhancementEngine. But
> before
> >> we
> >> >> can
> >> >> >>> >>> >> build such an engine we would need to
> >> >> >>> >>> >>
> >> >> >>> >>> >> * extend the Stanbol NLP processing API with Annotations
> for
> >> >> >>> >>> co-reference
> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing for those
> >> >> annotation
> >> >> >>> so
> >> >> >>> >>> >> that the RESTful NLP Analysis Service can provide
> >> co-reference
> >> >> >>> >>> >> information
> >> >> >>> >>> >>
> >> >> >>> >>> >> > At this moment I'll be focusing on 2 aspects:
> >> >> >>> >>> >> >
> >> >> >>> >>> >> > 1. Determine the best data structure to encapsulate the
> >> >> extracted
> >> >> >>> >>> >> > information. I'll take a closer look at Dolce.
> >> >> >>> >>> >>
> >> >> >>> >>> >> Don't make to to complex. Defining a proper structure to
> >> >> represent
> >> >> >>> >>> >> Events will only pay-off if we can also successfully
> extract
> >> >> such
> >> >> >>> >>> >> information form processed texts.
> >> >> >>> >>> >>
> >> >> >>> >>> >> I would start with
> >> >> >>> >>> >>
> >> >> >>> >>> >>  * fise:SettingAnnotation
> >> >> >>> >>> >>     * {fise:Enhancement} metadata
> >> >> >>> >>> >>
> >> >> >>> >>> >>  * fise:ParticipantAnnotation
> >> >> >>> >>> >>     * {fise:Enhancement} metadata
> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
> >> >> >>> >>> >>     * fise:suggestion {entityAnnotation} (multiple if
> there
> >> are
> >> >> >>> more
> >> >> >>> >>> >> suggestions)
> >> >> >>> >>> >>     * dc:type one of fise:Agent, fise:Patient,
> >> fise:Instrument,
> >> >> >>> >>> fise:Cause
> >> >> >>> >>> >>
> >> >> >>> >>> >>  * fise:OccurrentAnnotation
> >> >> >>> >>> >>     * {fise:Enhancement} metadata
> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
> >> >> >>> >>> >>     * dc:type set to fise:Activity
> >> >> >>> >>> >>
> >> >> >>> >>> >> If it turns out that we can extract more, we can add more
> >> >> >>> structure to
> >> >> >>> >>> >> those annotations. We might also think about using an own
> >> >> namespace
> >> >> >>> >>> >> for those extensions to the annotation structure.
> >> >> >>> >>> >>
> >> >> >>> >>> >> > 2. Determine how should all of this be integrated into
> >> >> Stanbol.
> >> >> >>> >>> >>
> >> >> >>> >>> >> Just create an EventExtractionEngine and configure a
> >> enhancement
> >> >> >>> chain
> >> >> >>> >>> >> that does NLP processing and EntityLinking.
> >> >> >>> >>> >>
> >> >> >>> >>> >> You should have a look at
> >> >> >>> >>> >>
> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does a lot of
> things
> >> >> with
> >> >> >>> NLP
> >> >> >>> >>> >> processing results (e.g. connecting adjectives (via
> verbs) to
> >> >> >>> >>> >> nouns/pronouns. So as long we can not use explicit
> dependency
> >> >> trees
> >> >> >>> >>> >> you code will need to do similar things with Nouns,
> Pronouns
> >> and
> >> >> >>> >>> >> Verbs.
> >> >> >>> >>> >>
> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a Java
> >> >> representation
> >> >> >>> of
> >> >> >>> >>> >> present fise:TextAnnotation and fise:EntityAnnotation [2].
> >> >> >>> Something
> >> >> >>> >>> >> similar will also be required by the EventExtractionEngine
> >> for
> >> >> fast
> >> >> >>> >>> >> access to such annotations while iterating over the
> >> Sentences of
> >> >> >>> the
> >> >> >>> >>> >> text.
> >> >> >>> >>> >>
> >> >> >>> >>> >>
> >> >> >>> >>> >> best
> >> >> >>> >>> >> Rupert
> >> >> >>> >>> >>
> >> >> >>> >>> >> [1]
> >> >> >>> >>> >>
> >> >> >>> >>>
> >> >> >>>
> >> >>
> >>
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java
> >> >> >>> >>> >> [2]
> >> >> >>> >>> >>
> >> >> >>> >>>
> >> >> >>>
> >> >>
> >>
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java
> >> >> >>> >>> >>
> >> >> >>> >>> >> >
> >> >> >>> >>> >> > Thanks
> >> >> >>> >>> >> >
> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion
> >> >> >>> >>> >> >> best
> >> >> >>> >>> >> >> Rupert
> >> >> >>> >>> >> >>
> >> >> >>> >>> >> >> --
> >> >> >>> >>> >> >> | Rupert Westenthaler
> >> >> rupert.westentha...@gmail.com
> >> >> >>> >>> >> >> | Bodenlehenstraße 11
> >> >> >>> ++43-699-11108907
> >> >> >>> >>> >> >> | A-5500 Bischofshofen
> >> >> >>> >>> >> >>
> >> >> >>> >>> >>
> >> >> >>> >>> >>
> >> >> >>> >>> >>
> >> >> >>> >>> >> --
> >> >> >>> >>> >> | Rupert Westenthaler
> >> rupert.westentha...@gmail.com
> >> >> >>> >>> >> | Bodenlehenstraße 11
> >> >> >>> ++43-699-11108907
> >> >> >>> >>> >> | A-5500 Bischofshofen
> >> >> >>> >>> >>
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>> --
> >> >> >>> >>> | Rupert Westenthaler
> rupert.westentha...@gmail.com
> >> >> >>> >>> | Bodenlehenstraße 11
> >> >> ++43-699-11108907
> >> >> >>> >>> | A-5500 Bischofshofen
> >> >> >>> >>>
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> --
> >> >> >>> | Rupert Westenthaler             rupert.westentha...@gmail.com
> >> >> >>> | Bodenlehenstraße 11
> ++43-699-11108907
> >> >> >>> | A-5500 Bischofshofen
> >> >> >>>
> >> >> >>
> >> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> | Rupert Westenthaler             rupert.westentha...@gmail.com
> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> >> | A-5500 Bischofshofen
> >> >>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westentha...@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westentha...@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Relation extraction feature

Reply via email to