Re: Named entity coref resolution based on dbpedia categories and rdf:type

Rupert Westenthaler Mon, 17 Mar 2014 22:16:48 -0700

Hi Cristian

On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
<[email protected]> wrote:
> 1. Updated to the latest code and it's gone. Cool
>
> 2. I start the stable launcher -> create a new instance of the
> PosChunkerEngine -> add it to the default chain. At this point everything
> looks good and works ok.
> After I restart the server the default chain is gone and instead I see this
> in the enhancement chains page : all-active (default, id: 149, ranking: 0,
> impl: AllActiveEnginesChain ). all-active did not contain the 'default'
> word before the restart.
>


Please note the default chain selection rules as described at [1]. You
can also access chains chains under '/enhancer/chain/{chain-name}'

best
Rupert

[1] 
http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain

> It looks like the config files are exactly what I need. Thanks.
>
>
> 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <[email protected]
>>:
>
>> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
>> <[email protected]> wrote:
>> > Thanks Rupert.
>> >
>> > A couple more questions/issues :
>> >
>> > 1. Whenever I start the stanbol server I'm seeing this in the console
>> > output :
>> >
>>
>> This should be fixed with STANBOL-1278 [1] [2]
>>
>> > 2. Whenever I restart the server the Weighted Chains get messed up. I
>> > usually use the 'default' chain and add my engine to it so there are 11
>> > engines in it. After the restart this chain now contains around 23
>> engines
>> > in total.
>>
>> I was not able to replicate this. What I tried was
>>
>> (1) start up the stable launcher
>> (2) add an additional engine to the default chain
>> (3) restart the launcher
>>
>> The default chain was not changed after (2) and (3). So I would need
>> further information for knowing why this is happening.
>>
>> Generally it is better to create you own chain instance as modifying
>> one that is provided by the default configuration. I would also
>> recommend that you keep your test configuration in text files and to
>> copy those to the 'stanbol/fileinstall' folder. Doing so prevent you
>> from manually entering the configuration after a software update. The
>> production-mode section [3] provides information on how to do that.
>>
>> best
>> Rupert
>>
>> [1] https://issues.apache.org/jira/browse/STANBOL-1278
>> [2] http://svn.apache.org/r1576623
>> [3] http://stanbol.apache.org/docs/trunk/production-mode
>>
>> > ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web [153]: Error
>> > starting
>> >
>>  
>> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
>> > tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
>> > (org.osgi
>> > .framework.BundleException: Unresolved constraint in bundle
>> > org.apache.stanbol.e
>> > nhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
>> > requirement [15
>> > 3.0] package; (&(package=javax.ws.rs
>> )(version>=0.0.0)(!(version>=2.0.0))))
>> > org.osgi.framework.BundleException: Unresolved constraint in bundle
>> > org.apache.s
>> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
>> > require
>> > ment [153.0] package; (&(package=javax.ws.rs
>> > )(version>=0.0.0)(!(version>=2.0.0))
>> > )
>> >         at
>> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>> >         at org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>> >         at
>> > org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
>> >
>> >         at
>> > org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
>> > )
>> >         at java.lang.Thread.run(Unknown Source)
>> >
>> > Despite of this the server starts fine and I can use the enhancer fine.
>> Do
>> > you guys see this as well?
>> >
>> >
>> > 2. Whenever I restart the server the Weighted Chains get messed up. I
>> > usually use the 'default' chain and add my engine to it so there are 11
>> > engines in it. After the restart this chain now contains around 23
>> engines
>> > in total.
>> >
>> >
>> >
>> >
>> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
>> [email protected]
>> >>:
>> >
>> >> Hi Cristian,
>> >>
>> >> NER Annotations are typically available as both
>> >> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation [1] in the
>> >> enhancement metadata. As you are already accessing the AnayzedText I
>> >> would prefer using the  NlpAnnotations.NER_ANNOTATION.
>> >>
>> >> best
>> >> Rupert
>> >>
>> >> [1]
>> >>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>> >>
>> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
>> >> <[email protected]> wrote:
>> >> > Thanks.
>> >> > I assume I should get the Named entities using the same but with
>> >> > NlpAnnotations.NER_ANNOTATION?
>> >> >
>> >> >
>> >> >
>> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
>> >> > [email protected]>:
>> >> >
>> >> >> Hallo Cristian,
>> >> >>
>> >> >> NounPhrases are not added to the RDF enhancement results. You need to
>> >> >> use the AnalyzedText ContentPart [1]
>> >> >>
>> >> >> here is some demo code you can use in the computeEnhancement method
>> >> >>
>> >> >>         AnalysedText at = NlpEngineHelper.getAnalysedText(this, ci,
>> >> true);
>> >> >>         Iterator<? extends Section> sections = at.getSentences();
>> >> >>         if(!sections.hasNext()){ //process as single sentence
>> >> >>             sections = Collections.singleton(at).iterator();
>> >> >>         }
>> >> >>
>> >> >>         while(sections.hasNext()){
>> >> >>             Section section = sections.next();
>> >> >>             Iterator<Span> chunks =
>> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>> >> >>             while(chunks.hasNext()){
>> >> >>                 Span chunk = chunks.next();
>> >> >>                 Value<PhraseTag> phrase =
>> >> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>> >> >>                 if(phrase.value().getCategory() ==
>> >> LexicalCategory.Noun){
>> >> >>                     log.info(" - NounPhrase [{},{}] {}", new
>> Object[]{
>> >> >>
>> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>> >> >>                 }
>> >> >>             }
>> >> >>         }
>> >> >>
>> >> >> hope this helps
>> >> >>
>> >> >> best
>> >> >> Rupert
>> >> >>
>> >> >> [1]
>> >> >>
>> >>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>> >> >>
>> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
>> >> >> <[email protected]> wrote:
>> >> >> > I started to implement the engine and I'm having problems with
>> getting
>> >> >> > results for noun phrases. I modified the "default" weighted chain
>> to
>> >> also
>> >> >> > include the PosChunkerEngine and ran a sample text : "Angela Merkel
>> >> >> visted
>> >> >> > China. The german chancellor met with various people". I expected
>> that
>> >> >> the
>> >> >> > RDF XML output would contain some info about the noun phrases but I
>> >> >> cannot
>> >> >> > see any.
>> >> >> > Could you point me to the correct way to generate the noun phrases?
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Cristian
>> >> >> >
>> >> >> >
>> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
>> >> >> [email protected]>:
>> >> >> >
>> >> >> >> Opened https://issues.apache.org/jira/browse/STANBOL-1279
>> >> >> >>
>> >> >> >>
>> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
>> >> >> [email protected]>
>> >> >> >> :
>> >> >> >>
>> >> >> >> Hi Rupert,
>> >> >> >>>
>> >> >> >>> The "spatial" dimension is a good idea. I'll also take a look at
>> >> Yago.
>> >> >> >>>
>> >> >> >>> I will create a Jira with what we talked about here. It will
>> >> probably
>> >> >> >>> have just a draft-like description for now and will be updated
>> as I
>> >> go
>> >> >> >>> along.
>> >> >> >>>
>> >> >> >>> Thanks,
>> >> >> >>> Cristian
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
>> >> >> >>> [email protected]>:
>> >> >> >>>
>> >> >> >>> Hi Cristian,
>> >> >> >>>>
>> >> >> >>>> definitely an interesting approach. You should have a look at
>> Yago2
>> >> >> >>>> [1]. As far as I can remember the Yago taxonomy is much better
>> >> >> >>>> structured as the one used by dbpedia. Mapping suggestions of
>> >> dbpedia
>> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and yago2 do
>> provide
>> >> >> >>>> mappings [2] and [3]
>> >> >> >>>>
>> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <[email protected]>:
>> >> >> >>>> >>
>> >> >> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's company
>> made
>> >> a
>> >> >> >>>> >> huge profit".
>> >> >> >>>>
>> >> >> >>>> Thats actually a very good example. Spatial contexts are very
>> >> >> >>>> important as they tend to be often used for referencing. So I
>> would
>> >> >> >>>> suggest to specially treat the spatial context. For spatial
>> >> Entities
>> >> >> >>>> (like a City) this is easy, but even for other (like a Person,
>> >> >> >>>> Company) you could use relations to spatial entities define
>> their
>> >> >> >>>> spatial context. This context could than be used to correctly
>> link
>> >> >> >>>> "The Redmond's company" to "Microsoft".
>> >> >> >>>>
>> >> >> >>>> In addition I would suggest to use the "spatial" context of each
>> >> >> >>>> entity (basically relation to entities that are cities, regions,
>> >> >> >>>> countries) as a separate dimension, because those are very often
>> >> used
>> >> >> >>>> for coreferences.
>> >> >> >>>>
>> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
>> >> >> >>>> [2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>> >> >> >>>> [3]
>> >> >> >>>>
>> >> >>
>> >>
>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
>> >> >> >>>> <[email protected]> wrote:
>> >> >> >>>> > There are several dbpedia categories for each entity, in this
>> >> case
>> >> >> for
>> >> >> >>>> > Microsoft we have :
>> >> >> >>>> >
>> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
>> >> >> >>>> > category:Microsoft
>> >> >> >>>> > category:Software_companies_of_the_United_States
>> >> >> >>>> > category:Software_companies_based_in_Washington_(state)
>> >> >> >>>> > category:Companies_established_in_1975
>> >> >> >>>> > category:1975_establishments_in_the_United_States
>> >> >> >>>> > category:Companies_based_in_Redmond,_Washington
>> >> >> >>>> >
>> >> category:Multinational_companies_headquartered_in_the_United_States
>> >> >> >>>> > category:Cloud_computing_providers
>> >> >> >>>> > category:Companies_in_the_Dow_Jones_Industrial_Average
>> >> >> >>>> >
>> >> >> >>>> > So we also have "Companies based in Redmont,Washington" which
>> >> could
>> >> >> be
>> >> >> >>>> > matched.
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> > There is still other contextual information from dbpedia which
>> >> can
>> >> >> be
>> >> >> >>>> used.
>> >> >> >>>> > For example for an Organization we could also include :
>> >> >> >>>> > dbpprop:industry = Software
>> >> >> >>>> > dbpprop:service = Online Service Providers
>> >> >> >>>> >
>> >> >> >>>> > and for a Person (that's for Barack Obama) :
>> >> >> >>>> >
>> >> >> >>>> > dbpedia-owl:profession:
>> >> >> >>>> >                                dbpedia:Author
>> >> >> >>>> >                                dbpedia:Constitutional_law
>> >> >> >>>> >                                dbpedia:Lawyer
>> >> >> >>>> >                                dbpedia:Community_organizing
>> >> >> >>>> >
>> >> >> >>>> > I'd like to continue investigating this as I think that it may
>> >> have
>> >> >> >>>> some
>> >> >> >>>> > value in increasing the number of coreference resolutions and
>> I'd
>> >> >> like
>> >> >> >>>> to
>> >> >> >>>> > concentrate more on precision rather than recall since we
>> already
>> >> >> have
>> >> >> >>>> a
>> >> >> >>>> > set of coreferences detected by the stanford nlp tool and this
>> >> would
>> >> >> >>>> be as
>> >> >> >>>> > an addition to that (at least this is how I would like to use
>> >> it).
>> >> >> >>>> >
>> >> >> >>>> > Is it ok if I track this by opening a jira? I could update it
>> to
>> >> >> show
>> >> >> >>>> my
>> >> >> >>>> > progress and also my conclusions and if it turns out that it
>> was
>> >> a
>> >> >> bad
>> >> >> >>>> idea
>> >> >> >>>> > then that's the situation at least I'll end up with more
>> >> knowledge
>> >> >> >>>> about
>> >> >> >>>> > Stanbol in the end :).
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <[email protected]>:
>> >> >> >>>> >
>> >> >> >>>> >> Hi Cristian,
>> >> >> >>>> >>
>> >> >> >>>> >> The approach sounds nice. I don't want to be the devil's
>> >> advocate
>> >> >> but
>> >> >> >>>> I'm
>> >> >> >>>> >> just not sure about the recall using the dbpedia categories
>> >> >> feature.
>> >> >> >>>> For
>> >> >> >>>> >> example, your sentence could be also "Microsoft posted its
>> 2013
>> >> >> >>>> earnings.
>> >> >> >>>> >> The Redmond's company made a huge profit". So, maybe
>> including
>> >> more
>> >> >> >>>> >> contextual information from dbpedia could increase the recall
>> >> but
>> >> >> of
>> >> >> >>>> course
>> >> >> >>>> >> will reduce the precision.
>> >> >> >>>> >>
>> >> >> >>>> >> Cheers,
>> >> >> >>>> >> Rafa
>> >> >> >>>> >>
>> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
>> >> >> >>>> >>
>> >> >> >>>> >>  Back with a more detailed description of the steps for
>> making
>> >> this
>> >> >> >>>> kind of
>> >> >> >>>> >>> coreference work.
>> >> >> >>>> >>>
>> >> >> >>>> >>> I will be using references to the following text in the
>> steps
>> >> >> below
>> >> >> >>>> in
>> >> >> >>>> >>> order to make things clearer : "Microsoft posted its 2013
>> >> >> earnings.
>> >> >> >>>> The
>> >> >> >>>> >>> software company made a huge profit."
>> >> >> >>>> >>>
>> >> >> >>>> >>> 1. For every noun phrase in the text which has :
>> >> >> >>>> >>>      a. a determinate pos which implies reference to an
>> entity
>> >> >> local
>> >> >> >>>> to
>> >> >> >>>> >>> the
>> >> >> >>>> >>> text, such as "the, this, these") but not "another, every",
>> etc
>> >> >> which
>> >> >> >>>> >>> implies a reference to an entity outside of the text.
>> >> >> >>>> >>>      b. having at least another noun aside from the main
>> >> required
>> >> >> >>>> noun
>> >> >> >>>> >>> which
>> >> >> >>>> >>> further describes it. For example I will not count "The
>> >> company"
>> >> >> as
>> >> >> >>>> being
>> >> >> >>>> >>> a
>> >> >> >>>> >>> legitimate candidate since this could create a lot of false
>> >> >> >>>> positives by
>> >> >> >>>> >>> considering the double meaning of some words such as "in the
>> >> >> company
>> >> >> >>>> of
>> >> >> >>>> >>> good people".
>> >> >> >>>> >>> "The software company" is a good candidate since we also
>> have
>> >> >> >>>> "software".
>> >> >> >>>> >>>
>> >> >> >>>> >>> 2. match the nouns in the noun phrase to the contents of the
>> >> >> dbpedia
>> >> >> >>>> >>> categories of each named entity found prior to the location
>> of
>> >> the
>> >> >> >>>> noun
>> >> >> >>>> >>> phrase in the text.
>> >> >> >>>> >>> The dbpedia categories are in the following format (for
>> >> Microsoft
>> >> >> for
>> >> >> >>>> >>> example) : "Software companies of the United States".
>> >> >> >>>> >>>   So we try to match "software company" with that.
>> >> >> >>>> >>> First, as you can see, the main noun in the dbpedia category
>> >> has a
>> >> >> >>>> plural
>> >> >> >>>> >>> form and it's the same for all categories which I saw. I
>> don't
>> >> >> know
>> >> >> >>>> if
>> >> >> >>>> >>> there's an easier way to do this but I thought of applying a
>> >> >> >>>> lemmatizer on
>> >> >> >>>> >>> the category and the noun phrase in order for them to have a
>> >> >> common
>> >> >> >>>> >>> denominator.This also works if the noun phrase itself has a
>> >> plural
>> >> >> >>>> form.
>> >> >> >>>> >>>
>> >> >> >>>> >>> Second, I'll need to use for comparison only the words in
>> the
>> >> >> >>>> category
>> >> >> >>>> >>> which are themselves nouns and not prepositions or
>> determiners
>> >> >> such
>> >> >> >>>> as "of
>> >> >> >>>> >>> the".This means that I need to pos tag the categories
>> contents
>> >> as
>> >> >> >>>> well.
>> >> >> >>>> >>> I was thinking of running the pos and lemma on the dbpedia
>> >> >> >>>> categories when
>> >> >> >>>> >>> building the dbpedia backed entity hub and storing them for
>> >> later
>> >> >> >>>> use - I
>> >> >> >>>> >>> don't know how feasible this is at the moment.
>> >> >> >>>> >>>
>> >> >> >>>> >>> After this I can compare each noun in the noun phrase with
>> the
>> >> >> >>>> equivalent
>> >> >> >>>> >>> nouns in the categories and based on the number of matches I
>> >> can
>> >> >> >>>> create a
>> >> >> >>>> >>> confidence level.
>> >> >> >>>> >>>
>> >> >> >>>> >>> 3. match the noun of the noun phrase with the rdf:type from
>> >> >> dbpedia
>> >> >> >>>> of the
>> >> >> >>>> >>> named entity. If this matches increase the confidence level.
>> >> >> >>>> >>>
>> >> >> >>>> >>> 4. If there are multiple named entities which can match a
>> >> certain
>> >> >> >>>> noun
>> >> >> >>>> >>> phrase then link the noun phrase with the closest named
>> entity
>> >> >> prior
>> >> >> >>>> to it
>> >> >> >>>> >>> in the text.
>> >> >> >>>> >>>
>> >> >> >>>> >>> What do you think?
>> >> >> >>>> >>>
>> >> >> >>>> >>> Cristian
>> >> >> >>>> >>>
>> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <[email protected]>:
>> >> >> >>>> >>>
>> >> >> >>>> >>>  Hi Rafa,
>> >> >> >>>> >>>>
>> >> >> >>>> >>>> I don't yet have a concrete heursitic but I'm working on
>> it.
>> >> I'll
>> >> >> >>>> provide
>> >> >> >>>> >>>> it here so that you guys can give me a feedback on it.
>> >> >> >>>> >>>>
>> >> >> >>>> >>>> What are "locality" features?
>> >> >> >>>> >>>>
>> >> >> >>>> >>>> I looked at Bart and other coref tools such as ArkRef and
>> >> >> >>>> CherryPicker
>> >> >> >>>> >>>> and
>> >> >> >>>> >>>> they don't provide such a coreference.
>> >> >> >>>> >>>>
>> >> >> >>>> >>>> Cristian
>> >> >> >>>> >>>>
>> >> >> >>>> >>>>
>> >> >> >>>> >>>> 2014-01-30 Rafa Haro <[email protected]>:
>> >> >> >>>> >>>>
>> >> >> >>>> >>>> Hi Cristian,
>> >> >> >>>> >>>>
>> >> >> >>>> >>>>> Without having more details about your concrete heuristic,
>> >> in my
>> >> >> >>>> honest
>> >> >> >>>> >>>>> opinion, such approach could produce a lot of false
>> >> positives. I
>> >> >> >>>> don't
>> >> >> >>>> >>>>> know
>> >> >> >>>> >>>>> if you are planning to use some "locality" features to
>> detect
>> >> >> such
>> >> >> >>>> >>>>> coreferences but you need to take into account that it is
>> >> quite
>> >> >> >>>> usual
>> >> >> >>>> >>>>> that
>> >> >> >>>> >>>>> coreferenced mentions can occurs even in different
>> >> paragraphs.
>> >> >> >>>> Although
>> >> >> >>>> >>>>> I'm
>> >> >> >>>> >>>>> not an expert in Natural Language Understanding, I would
>> say
>> >> it
>> >> >> is
>> >> >> >>>> quite
>> >> >> >>>> >>>>> difficult to get decent precision/recall rates for
>> >> coreferencing
>> >> >> >>>> using
>> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to others tools like
>> >> BART
>> >> >> (
>> >> >> >>>> >>>>> http://www.bart-coref.org/).
>> >> >> >>>> >>>>>
>> >> >> >>>> >>>>> Cheers,
>> >> >> >>>> >>>>> Rafa Haro
>> >> >> >>>> >>>>>
>> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
>> >> >> >>>> >>>>>
>> >> >> >>>> >>>>>   Hi,
>> >> >> >>>> >>>>>
>> >> >> >>>> >>>>>> One of the necessary steps for implementing the Event
>> >> >> extraction
>> >> >> >>>> Engine
>> >> >> >>>> >>>>>> feature :
>> >> https://issues.apache.org/jira/browse/STANBOL-1121is
>> >> >> >>>> to
>> >> >> >>>> >>>>>> have
>> >> >> >>>> >>>>>> coreference resolution in the given text. This is
>> provided
>> >> now
>> >> >> >>>> via the
>> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw this module is
>> >> >> performing
>> >> >> >>>> >>>>>> mostly
>> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and Mr.
>> Obama)
>> >> >> >>>> coreference
>> >> >> >>>> >>>>>> resolution.
>> >> >> >>>> >>>>>>
>> >> >> >>>> >>>>>> In order to get more coreferences from the text I though
>> of
>> >> >> >>>> creating
>> >> >> >>>> >>>>>> some
>> >> >> >>>> >>>>>> logic that would detect this kind of coreference :
>> >> >> >>>> >>>>>> "Apple reaches new profit heights. The software company
>> just
>> >> >> >>>> announced
>> >> >> >>>> >>>>>> its
>> >> >> >>>> >>>>>> 2013 earnings."
>> >> >> >>>> >>>>>> Here "The software company" obviously refers to "Apple".
>> >> >> >>>> >>>>>> So I'd like to detect coreferences of Named Entities
>> which
>> >> are
>> >> >> of
>> >> >> >>>> the
>> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this case "company" and
>> >> also
>> >> >> >>>> have
>> >> >> >>>> >>>>>> attributes which can be found in the dbpedia categories
>> of
>> >> the
>> >> >> >>>> named
>> >> >> >>>> >>>>>> entity, in this case "software".
>> >> >> >>>> >>>>>>
>> >> >> >>>> >>>>>> The detection of coreferences such as "The software
>> >> company" in
>> >> >> >>>> the
>> >> >> >>>> >>>>>> text
>> >> >> >>>> >>>>>> would also be done by either using the new Pos Tag Based
>> >> Phrase
>> >> >> >>>> >>>>>> extraction
>> >> >> >>>> >>>>>> Engine (noun phrases) or by using a dependency tree of
>> the
>> >> >> >>>> sentence and
>> >> >> >>>> >>>>>> picking up only subjects or objects.
>> >> >> >>>> >>>>>>
>> >> >> >>>> >>>>>> At this point I'd like to know if this kind of logic
>> would
>> >> be
>> >> >> >>>> useful
>> >> >> >>>> >>>>>> as a
>> >> >> >>>> >>>>>> separate Enhancement Engine (in case the precision and
>> >> recall
>> >> >> are
>> >> >> >>>> good
>> >> >> >>>> >>>>>> enough) in Stanbol?
>> >> >> >>>> >>>>>>
>> >> >> >>>> >>>>>> Thanks,
>> >> >> >>>> >>>>>> Cristian
>> >> >> >>>> >>>>>>
>> >> >> >>>> >>>>>>
>> >> >> >>>> >>>>>>
>> >> >> >>>> >>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> --
>> >> >> >>>> | Rupert Westenthaler             [email protected]
>> >> >> >>>> | Bodenlehenstraße 11
>> >> ++43-699-11108907
>> >> >> >>>> | A-5500 Bischofshofen
>> >> >> >>>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> | Rupert Westenthaler             [email protected]
>> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> >> | A-5500 Bischofshofen
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> | Rupert Westenthaler             [email protected]
>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             [email protected]
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Reply via email to