Re: Named entity coref resolution based on dbpedia categories and rdf:type

Cristian Petroaca Tue, 18 Mar 2014 12:25:40 -0700

Related to the default chain selection rules : before restart I had a chain
with the name 'default' as in I could access it via enhancer/chain/default.
Then I just added another engine to the 'default' chain. I assumed that
after the restart the chain with the 'default' name would be persisted. So
the first rule should have been applied after the restart as well. But
instead I cannot reach it via enhancer/chain/default anymore so its gone.
Anyway, this is not a big deal, it's not blocking me in any way, I just
wanted to understand where the problem is.



2014-03-18 7:15 GMT+02:00 Rupert Westenthaler <[email protected]
>:

> Hi Cristian
>
> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
> <[email protected]> wrote:
> > 1. Updated to the latest code and it's gone. Cool
> >
> > 2. I start the stable launcher -> create a new instance of the
> > PosChunkerEngine -> add it to the default chain. At this point everything
> > looks good and works ok.
> > After I restart the server the default chain is gone and instead I see
> this
> > in the enhancement chains page : all-active (default, id: 149, ranking:
> 0,
> > impl: AllActiveEnginesChain ). all-active did not contain the 'default'
> > word before the restart.
> >
>
> Please note the default chain selection rules as described at [1]. You
> can also access chains chains under '/enhancer/chain/{chain-name}'
>
> best
> Rupert
>
> [1]
> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
>
> > It looks like the config files are exactly what I need. Thanks.
> >
> >
> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
> [email protected]
> >>:
> >
> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
> >> <[email protected]> wrote:
> >> > Thanks Rupert.
> >> >
> >> > A couple more questions/issues :
> >> >
> >> > 1. Whenever I start the stanbol server I'm seeing this in the console
> >> > output :
> >> >
> >>
> >> This should be fixed with STANBOL-1278 [1] [2]
> >>
> >> > 2. Whenever I restart the server the Weighted Chains get messed up. I
> >> > usually use the 'default' chain and add my engine to it so there are
> 11
> >> > engines in it. After the restart this chain now contains around 23
> >> engines
> >> > in total.
> >>
> >> I was not able to replicate this. What I tried was
> >>
> >> (1) start up the stable launcher
> >> (2) add an additional engine to the default chain
> >> (3) restart the launcher
> >>
> >> The default chain was not changed after (2) and (3). So I would need
> >> further information for knowing why this is happening.
> >>
> >> Generally it is better to create you own chain instance as modifying
> >> one that is provided by the default configuration. I would also
> >> recommend that you keep your test configuration in text files and to
> >> copy those to the 'stanbol/fileinstall' folder. Doing so prevent you
> >> from manually entering the configuration after a software update. The
> >> production-mode section [3] provides information on how to do that.
> >>
> >> best
> >> Rupert
> >>
> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
> >> [2] http://svn.apache.org/r1576623
> >> [3] http://stanbol.apache.org/docs/trunk/production-mode
> >>
> >> > ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web [153]:
> Error
> >> > starting
> >> >
> >>
>  
> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
> >> > tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
> >> > (org.osgi
> >> > .framework.BundleException: Unresolved constraint in bundle
> >> > org.apache.stanbol.e
> >> > nhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
> >> > requirement [15
> >> > 3.0] package; (&(package=javax.ws.rs
> >> )(version>=0.0.0)(!(version>=2.0.0))))
> >> > org.osgi.framework.BundleException: Unresolved constraint in bundle
> >> > org.apache.s
> >> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve 153.0:
> missing
> >> > require
> >> > ment [153.0] package; (&(package=javax.ws.rs
> >> > )(version>=0.0.0)(!(version>=2.0.0))
> >> > )
> >> >         at
> >> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
> >> >         at
> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
> >> >         at
> >> > org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
> >> >
> >> >         at
> >> > org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
> >> > )
> >> >         at java.lang.Thread.run(Unknown Source)
> >> >
> >> > Despite of this the server starts fine and I can use the enhancer
> fine.
> >> Do
> >> > you guys see this as well?
> >> >
> >> >
> >> > 2. Whenever I restart the server the Weighted Chains get messed up. I
> >> > usually use the 'default' chain and add my engine to it so there are
> 11
> >> > engines in it. After the restart this chain now contains around 23
> >> engines
> >> > in total.
> >> >
> >> >
> >> >
> >> >
> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
> >> [email protected]
> >> >>:
> >> >
> >> >> Hi Cristian,
> >> >>
> >> >> NER Annotations are typically available as both
> >> >> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation [1] in the
> >> >> enhancement metadata. As you are already accessing the AnayzedText I
> >> >> would prefer using the  NlpAnnotations.NER_ANNOTATION.
> >> >>
> >> >> best
> >> >> Rupert
> >> >>
> >> >> [1]
> >> >>
> >>
> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
> >> >>
> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
> >> >> <[email protected]> wrote:
> >> >> > Thanks.
> >> >> > I assume I should get the Named entities using the same but with
> >> >> > NlpAnnotations.NER_ANNOTATION?
> >> >> >
> >> >> >
> >> >> >
> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
> >> >> > [email protected]>:
> >> >> >
> >> >> >> Hallo Cristian,
> >> >> >>
> >> >> >> NounPhrases are not added to the RDF enhancement results. You
> need to
> >> >> >> use the AnalyzedText ContentPart [1]
> >> >> >>
> >> >> >> here is some demo code you can use in the computeEnhancement
> method
> >> >> >>
> >> >> >>         AnalysedText at = NlpEngineHelper.getAnalysedText(this,
> ci,
> >> >> true);
> >> >> >>         Iterator<? extends Section> sections = at.getSentences();
> >> >> >>         if(!sections.hasNext()){ //process as single sentence
> >> >> >>             sections = Collections.singleton(at).iterator();
> >> >> >>         }
> >> >> >>
> >> >> >>         while(sections.hasNext()){
> >> >> >>             Section section = sections.next();
> >> >> >>             Iterator<Span> chunks =
> >> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
> >> >> >>             while(chunks.hasNext()){
> >> >> >>                 Span chunk = chunks.next();
> >> >> >>                 Value<PhraseTag> phrase =
> >> >> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
> >> >> >>                 if(phrase.value().getCategory() ==
> >> >> LexicalCategory.Noun){
> >> >> >>                     log.info(" - NounPhrase [{},{}] {}", new
> >> Object[]{
> >> >> >>
> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
> >> >> >>                 }
> >> >> >>             }
> >> >> >>         }
> >> >> >>
> >> >> >> hope this helps
> >> >> >>
> >> >> >> best
> >> >> >> Rupert
> >> >> >>
> >> >> >> [1]
> >> >> >>
> >> >>
> >>
> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
> >> >> >>
> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
> >> >> >> <[email protected]> wrote:
> >> >> >> > I started to implement the engine and I'm having problems with
> >> getting
> >> >> >> > results for noun phrases. I modified the "default" weighted
> chain
> >> to
> >> >> also
> >> >> >> > include the PosChunkerEngine and ran a sample text : "Angela
> Merkel
> >> >> >> visted
> >> >> >> > China. The german chancellor met with various people". I
> expected
> >> that
> >> >> >> the
> >> >> >> > RDF XML output would contain some info about the noun phrases
> but I
> >> >> >> cannot
> >> >> >> > see any.
> >> >> >> > Could you point me to the correct way to generate the noun
> phrases?
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> > Cristian
> >> >> >> >
> >> >> >> >
> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
> >> >> >> [email protected]>:
> >> >> >> >
> >> >> >> >> Opened https://issues.apache.org/jira/browse/STANBOL-1279
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
> >> >> >> [email protected]>
> >> >> >> >> :
> >> >> >> >>
> >> >> >> >> Hi Rupert,
> >> >> >> >>>
> >> >> >> >>> The "spatial" dimension is a good idea. I'll also take a look
> at
> >> >> Yago.
> >> >> >> >>>
> >> >> >> >>> I will create a Jira with what we talked about here. It will
> >> >> probably
> >> >> >> >>> have just a draft-like description for now and will be updated
> >> as I
> >> >> go
> >> >> >> >>> along.
> >> >> >> >>>
> >> >> >> >>> Thanks,
> >> >> >> >>> Cristian
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
> >> >> >> >>> [email protected]>:
> >> >> >> >>>
> >> >> >> >>> Hi Cristian,
> >> >> >> >>>>
> >> >> >> >>>> definitely an interesting approach. You should have a look at
> >> Yago2
> >> >> >> >>>> [1]. As far as I can remember the Yago taxonomy is much
> better
> >> >> >> >>>> structured as the one used by dbpedia. Mapping suggestions of
> >> >> dbpedia
> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and yago2 do
> >> provide
> >> >> >> >>>> mappings [2] and [3]
> >> >> >> >>>>
> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <[email protected]>:
> >> >> >> >>>> >>
> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's company
> >> made
> >> >> a
> >> >> >> >>>> >> huge profit".
> >> >> >> >>>>
> >> >> >> >>>> Thats actually a very good example. Spatial contexts are very
> >> >> >> >>>> important as they tend to be often used for referencing. So I
> >> would
> >> >> >> >>>> suggest to specially treat the spatial context. For spatial
> >> >> Entities
> >> >> >> >>>> (like a City) this is easy, but even for other (like a
> Person,
> >> >> >> >>>> Company) you could use relations to spatial entities define
> >> their
> >> >> >> >>>> spatial context. This context could than be used to correctly
> >> link
> >> >> >> >>>> "The Redmond's company" to "Microsoft".
> >> >> >> >>>>
> >> >> >> >>>> In addition I would suggest to use the "spatial" context of
> each
> >> >> >> >>>> entity (basically relation to entities that are cities,
> regions,
> >> >> >> >>>> countries) as a separate dimension, because those are very
> often
> >> >> used
> >> >> >> >>>> for coreferences.
> >> >> >> >>>>
> >> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
> >> >> >> >>>> [2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
> >> >> >> >>>> [3]
> >> >> >> >>>>
> >> >> >>
> >> >>
> >>
> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
> >> >> >> >>>> <[email protected]> wrote:
> >> >> >> >>>> > There are several dbpedia categories for each entity, in
> this
> >> >> case
> >> >> >> for
> >> >> >> >>>> > Microsoft we have :
> >> >> >> >>>> >
> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
> >> >> >> >>>> > category:Microsoft
> >> >> >> >>>> > category:Software_companies_of_the_United_States
> >> >> >> >>>> > category:Software_companies_based_in_Washington_(state)
> >> >> >> >>>> > category:Companies_established_in_1975
> >> >> >> >>>> > category:1975_establishments_in_the_United_States
> >> >> >> >>>> > category:Companies_based_in_Redmond,_Washington
> >> >> >> >>>> >
> >> >> category:Multinational_companies_headquartered_in_the_United_States
> >> >> >> >>>> > category:Cloud_computing_providers
> >> >> >> >>>> > category:Companies_in_the_Dow_Jones_Industrial_Average
> >> >> >> >>>> >
> >> >> >> >>>> > So we also have "Companies based in Redmont,Washington"
> which
> >> >> could
> >> >> >> be
> >> >> >> >>>> > matched.
> >> >> >> >>>> >
> >> >> >> >>>> >
> >> >> >> >>>> > There is still other contextual information from dbpedia
> which
> >> >> can
> >> >> >> be
> >> >> >> >>>> used.
> >> >> >> >>>> > For example for an Organization we could also include :
> >> >> >> >>>> > dbpprop:industry = Software
> >> >> >> >>>> > dbpprop:service = Online Service Providers
> >> >> >> >>>> >
> >> >> >> >>>> > and for a Person (that's for Barack Obama) :
> >> >> >> >>>> >
> >> >> >> >>>> > dbpedia-owl:profession:
> >> >> >> >>>> >                                dbpedia:Author
> >> >> >> >>>> >                                dbpedia:Constitutional_law
> >> >> >> >>>> >                                dbpedia:Lawyer
> >> >> >> >>>> >                                dbpedia:Community_organizing
> >> >> >> >>>> >
> >> >> >> >>>> > I'd like to continue investigating this as I think that it
> may
> >> >> have
> >> >> >> >>>> some
> >> >> >> >>>> > value in increasing the number of coreference resolutions
> and
> >> I'd
> >> >> >> like
> >> >> >> >>>> to
> >> >> >> >>>> > concentrate more on precision rather than recall since we
> >> already
> >> >> >> have
> >> >> >> >>>> a
> >> >> >> >>>> > set of coreferences detected by the stanford nlp tool and
> this
> >> >> would
> >> >> >> >>>> be as
> >> >> >> >>>> > an addition to that (at least this is how I would like to
> use
> >> >> it).
> >> >> >> >>>> >
> >> >> >> >>>> > Is it ok if I track this by opening a jira? I could update
> it
> >> to
> >> >> >> show
> >> >> >> >>>> my
> >> >> >> >>>> > progress and also my conclusions and if it turns out that
> it
> >> was
> >> >> a
> >> >> >> bad
> >> >> >> >>>> idea
> >> >> >> >>>> > then that's the situation at least I'll end up with more
> >> >> knowledge
> >> >> >> >>>> about
> >> >> >> >>>> > Stanbol in the end :).
> >> >> >> >>>> >
> >> >> >> >>>> >
> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <[email protected]>:
> >> >> >> >>>> >
> >> >> >> >>>> >> Hi Cristian,
> >> >> >> >>>> >>
> >> >> >> >>>> >> The approach sounds nice. I don't want to be the devil's
> >> >> advocate
> >> >> >> but
> >> >> >> >>>> I'm
> >> >> >> >>>> >> just not sure about the recall using the dbpedia
> categories
> >> >> >> feature.
> >> >> >> >>>> For
> >> >> >> >>>> >> example, your sentence could be also "Microsoft posted its
> >> 2013
> >> >> >> >>>> earnings.
> >> >> >> >>>> >> The Redmond's company made a huge profit". So, maybe
> >> including
> >> >> more
> >> >> >> >>>> >> contextual information from dbpedia could increase the
> recall
> >> >> but
> >> >> >> of
> >> >> >> >>>> course
> >> >> >> >>>> >> will reduce the precision.
> >> >> >> >>>> >>
> >> >> >> >>>> >> Cheers,
> >> >> >> >>>> >> Rafa
> >> >> >> >>>> >>
> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
> >> >> >> >>>> >>
> >> >> >> >>>> >>  Back with a more detailed description of the steps for
> >> making
> >> >> this
> >> >> >> >>>> kind of
> >> >> >> >>>> >>> coreference work.
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> I will be using references to the following text in the
> >> steps
> >> >> >> below
> >> >> >> >>>> in
> >> >> >> >>>> >>> order to make things clearer : "Microsoft posted its 2013
> >> >> >> earnings.
> >> >> >> >>>> The
> >> >> >> >>>> >>> software company made a huge profit."
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> 1. For every noun phrase in the text which has :
> >> >> >> >>>> >>>      a. a determinate pos which implies reference to an
> >> entity
> >> >> >> local
> >> >> >> >>>> to
> >> >> >> >>>> >>> the
> >> >> >> >>>> >>> text, such as "the, this, these") but not "another,
> every",
> >> etc
> >> >> >> which
> >> >> >> >>>> >>> implies a reference to an entity outside of the text.
> >> >> >> >>>> >>>      b. having at least another noun aside from the main
> >> >> required
> >> >> >> >>>> noun
> >> >> >> >>>> >>> which
> >> >> >> >>>> >>> further describes it. For example I will not count "The
> >> >> company"
> >> >> >> as
> >> >> >> >>>> being
> >> >> >> >>>> >>> a
> >> >> >> >>>> >>> legitimate candidate since this could create a lot of
> false
> >> >> >> >>>> positives by
> >> >> >> >>>> >>> considering the double meaning of some words such as "in
> the
> >> >> >> company
> >> >> >> >>>> of
> >> >> >> >>>> >>> good people".
> >> >> >> >>>> >>> "The software company" is a good candidate since we also
> >> have
> >> >> >> >>>> "software".
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to the contents of
> the
> >> >> >> dbpedia
> >> >> >> >>>> >>> categories of each named entity found prior to the
> location
> >> of
> >> >> the
> >> >> >> >>>> noun
> >> >> >> >>>> >>> phrase in the text.
> >> >> >> >>>> >>> The dbpedia categories are in the following format (for
> >> >> Microsoft
> >> >> >> for
> >> >> >> >>>> >>> example) : "Software companies of the United States".
> >> >> >> >>>> >>>   So we try to match "software company" with that.
> >> >> >> >>>> >>> First, as you can see, the main noun in the dbpedia
> category
> >> >> has a
> >> >> >> >>>> plural
> >> >> >> >>>> >>> form and it's the same for all categories which I saw. I
> >> don't
> >> >> >> know
> >> >> >> >>>> if
> >> >> >> >>>> >>> there's an easier way to do this but I thought of
> applying a
> >> >> >> >>>> lemmatizer on
> >> >> >> >>>> >>> the category and the noun phrase in order for them to
> have a
> >> >> >> common
> >> >> >> >>>> >>> denominator.This also works if the noun phrase itself
> has a
> >> >> plural
> >> >> >> >>>> form.
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> Second, I'll need to use for comparison only the words in
> >> the
> >> >> >> >>>> category
> >> >> >> >>>> >>> which are themselves nouns and not prepositions or
> >> determiners
> >> >> >> such
> >> >> >> >>>> as "of
> >> >> >> >>>> >>> the".This means that I need to pos tag the categories
> >> contents
> >> >> as
> >> >> >> >>>> well.
> >> >> >> >>>> >>> I was thinking of running the pos and lemma on the
> dbpedia
> >> >> >> >>>> categories when
> >> >> >> >>>> >>> building the dbpedia backed entity hub and storing them
> for
> >> >> later
> >> >> >> >>>> use - I
> >> >> >> >>>> >>> don't know how feasible this is at the moment.
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> After this I can compare each noun in the noun phrase
> with
> >> the
> >> >> >> >>>> equivalent
> >> >> >> >>>> >>> nouns in the categories and based on the number of
> matches I
> >> >> can
> >> >> >> >>>> create a
> >> >> >> >>>> >>> confidence level.
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> 3. match the noun of the noun phrase with the rdf:type
> from
> >> >> >> dbpedia
> >> >> >> >>>> of the
> >> >> >> >>>> >>> named entity. If this matches increase the confidence
> level.
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> 4. If there are multiple named entities which can match a
> >> >> certain
> >> >> >> >>>> noun
> >> >> >> >>>> >>> phrase then link the noun phrase with the closest named
> >> entity
> >> >> >> prior
> >> >> >> >>>> to it
> >> >> >> >>>> >>> in the text.
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> What do you think?
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> Cristian
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
> [email protected]>:
> >> >> >> >>>> >>>
> >> >> >> >>>> >>>  Hi Rafa,
> >> >> >> >>>> >>>>
> >> >> >> >>>> >>>> I don't yet have a concrete heursitic but I'm working on
> >> it.
> >> >> I'll
> >> >> >> >>>> provide
> >> >> >> >>>> >>>> it here so that you guys can give me a feedback on it.
> >> >> >> >>>> >>>>
> >> >> >> >>>> >>>> What are "locality" features?
> >> >> >> >>>> >>>>
> >> >> >> >>>> >>>> I looked at Bart and other coref tools such as ArkRef
> and
> >> >> >> >>>> CherryPicker
> >> >> >> >>>> >>>> and
> >> >> >> >>>> >>>> they don't provide such a coreference.
> >> >> >> >>>> >>>>
> >> >> >> >>>> >>>> Cristian
> >> >> >> >>>> >>>>
> >> >> >> >>>> >>>>
> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <[email protected]>:
> >> >> >> >>>> >>>>
> >> >> >> >>>> >>>> Hi Cristian,
> >> >> >> >>>> >>>>
> >> >> >> >>>> >>>>> Without having more details about your concrete
> heuristic,
> >> >> in my
> >> >> >> >>>> honest
> >> >> >> >>>> >>>>> opinion, such approach could produce a lot of false
> >> >> positives. I
> >> >> >> >>>> don't
> >> >> >> >>>> >>>>> know
> >> >> >> >>>> >>>>> if you are planning to use some "locality" features to
> >> detect
> >> >> >> such
> >> >> >> >>>> >>>>> coreferences but you need to take into account that it
> is
> >> >> quite
> >> >> >> >>>> usual
> >> >> >> >>>> >>>>> that
> >> >> >> >>>> >>>>> coreferenced mentions can occurs even in different
> >> >> paragraphs.
> >> >> >> >>>> Although
> >> >> >> >>>> >>>>> I'm
> >> >> >> >>>> >>>>> not an expert in Natural Language Understanding, I
> would
> >> say
> >> >> it
> >> >> >> is
> >> >> >> >>>> quite
> >> >> >> >>>> >>>>> difficult to get decent precision/recall rates for
> >> >> coreferencing
> >> >> >> >>>> using
> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to others tools
> like
> >> >> BART
> >> >> >> (
> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
> >> >> >> >>>> >>>>>
> >> >> >> >>>> >>>>> Cheers,
> >> >> >> >>>> >>>>> Rafa Haro
> >> >> >> >>>> >>>>>
> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
> >> >> >> >>>> >>>>>
> >> >> >> >>>> >>>>>   Hi,
> >> >> >> >>>> >>>>>
> >> >> >> >>>> >>>>>> One of the necessary steps for implementing the Event
> >> >> >> extraction
> >> >> >> >>>> Engine
> >> >> >> >>>> >>>>>> feature :
> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is
> >> >> >> >>>> to
> >> >> >> >>>> >>>>>> have
> >> >> >> >>>> >>>>>> coreference resolution in the given text. This is
> >> provided
> >> >> now
> >> >> >> >>>> via the
> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw this module
> is
> >> >> >> performing
> >> >> >> >>>> >>>>>> mostly
> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and Mr.
> >> Obama)
> >> >> >> >>>> coreference
> >> >> >> >>>> >>>>>> resolution.
> >> >> >> >>>> >>>>>>
> >> >> >> >>>> >>>>>> In order to get more coreferences from the text I
> though
> >> of
> >> >> >> >>>> creating
> >> >> >> >>>> >>>>>> some
> >> >> >> >>>> >>>>>> logic that would detect this kind of coreference :
> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The software
> company
> >> just
> >> >> >> >>>> announced
> >> >> >> >>>> >>>>>> its
> >> >> >> >>>> >>>>>> 2013 earnings."
> >> >> >> >>>> >>>>>> Here "The software company" obviously refers to
> "Apple".
> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of Named Entities
> >> which
> >> >> are
> >> >> >> of
> >> >> >> >>>> the
> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this case "company"
> and
> >> >> also
> >> >> >> >>>> have
> >> >> >> >>>> >>>>>> attributes which can be found in the dbpedia
> categories
> >> of
> >> >> the
> >> >> >> >>>> named
> >> >> >> >>>> >>>>>> entity, in this case "software".
> >> >> >> >>>> >>>>>>
> >> >> >> >>>> >>>>>> The detection of coreferences such as "The software
> >> >> company" in
> >> >> >> >>>> the
> >> >> >> >>>> >>>>>> text
> >> >> >> >>>> >>>>>> would also be done by either using the new Pos Tag
> Based
> >> >> Phrase
> >> >> >> >>>> >>>>>> extraction
> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a dependency tree of
> >> the
> >> >> >> >>>> sentence and
> >> >> >> >>>> >>>>>> picking up only subjects or objects.
> >> >> >> >>>> >>>>>>
> >> >> >> >>>> >>>>>> At this point I'd like to know if this kind of logic
> >> would
> >> >> be
> >> >> >> >>>> useful
> >> >> >> >>>> >>>>>> as a
> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case the precision and
> >> >> recall
> >> >> >> are
> >> >> >> >>>> good
> >> >> >> >>>> >>>>>> enough) in Stanbol?
> >> >> >> >>>> >>>>>>
> >> >> >> >>>> >>>>>> Thanks,
> >> >> >> >>>> >>>>>> Cristian
> >> >> >> >>>> >>>>>>
> >> >> >> >>>> >>>>>>
> >> >> >> >>>> >>>>>>
> >> >> >> >>>> >>
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>> --
> >> >> >> >>>> | Rupert Westenthaler
> [email protected]
> >> >> >> >>>> | Bodenlehenstraße 11
> >> >> ++43-699-11108907
> >> >> >> >>>> | A-5500 Bischofshofen
> >> >> >> >>>>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> | Rupert Westenthaler             [email protected]
> >> >> >> | Bodenlehenstraße 11
> ++43-699-11108907
> >> >> >> | A-5500 Bischofshofen
> >> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> | Rupert Westenthaler             [email protected]
> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> >> | A-5500 Bischofshofen
> >> >>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             [email protected]
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Reply via email to