Re: Named entity coref resolution based on dbpedia categories and rdf:type

Rupert Westenthaler Wed, 19 Mar 2014 00:09:50 -0700

Hi Cristian,

can you provide the contents of the chain after your modifications?
Would be interesting to test why the chain is no longer active after
the restart.


You can find the config file in the 'stanbol/fileinstall' folder.

best
Rupert

On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
<[email protected]> wrote:
> Related to the default chain selection rules : before restart I had a chain
> with the name 'default' as in I could access it via enhancer/chain/default.
> Then I just added another engine to the 'default' chain. I assumed that
> after the restart the chain with the 'default' name would be persisted. So
> the first rule should have been applied after the restart as well. But
> instead I cannot reach it via enhancer/chain/default anymore so its gone.
> Anyway, this is not a big deal, it's not blocking me in any way, I just
> wanted to understand where the problem is.
>
>
> 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler <[email protected]
>>:
>
>> Hi Cristian
>>
>> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
>> <[email protected]> wrote:
>> > 1. Updated to the latest code and it's gone. Cool
>> >
>> > 2. I start the stable launcher -> create a new instance of the
>> > PosChunkerEngine -> add it to the default chain. At this point everything
>> > looks good and works ok.
>> > After I restart the server the default chain is gone and instead I see
>> this
>> > in the enhancement chains page : all-active (default, id: 149, ranking:
>> 0,
>> > impl: AllActiveEnginesChain ). all-active did not contain the 'default'
>> > word before the restart.
>> >
>>
>> Please note the default chain selection rules as described at [1]. You
>> can also access chains chains under '/enhancer/chain/{chain-name}'
>>
>> best
>> Rupert
>>
>> [1]
>> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
>>
>> > It looks like the config files are exactly what I need. Thanks.
>> >
>> >
>> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
>> [email protected]
>> >>:
>> >
>> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
>> >> <[email protected]> wrote:
>> >> > Thanks Rupert.
>> >> >
>> >> > A couple more questions/issues :
>> >> >
>> >> > 1. Whenever I start the stanbol server I'm seeing this in the console
>> >> > output :
>> >> >
>> >>
>> >> This should be fixed with STANBOL-1278 [1] [2]
>> >>
>> >> > 2. Whenever I restart the server the Weighted Chains get messed up. I
>> >> > usually use the 'default' chain and add my engine to it so there are
>> 11
>> >> > engines in it. After the restart this chain now contains around 23
>> >> engines
>> >> > in total.
>> >>
>> >> I was not able to replicate this. What I tried was
>> >>
>> >> (1) start up the stable launcher
>> >> (2) add an additional engine to the default chain
>> >> (3) restart the launcher
>> >>
>> >> The default chain was not changed after (2) and (3). So I would need
>> >> further information for knowing why this is happening.
>> >>
>> >> Generally it is better to create you own chain instance as modifying
>> >> one that is provided by the default configuration. I would also
>> >> recommend that you keep your test configuration in text files and to
>> >> copy those to the 'stanbol/fileinstall' folder. Doing so prevent you
>> >> from manually entering the configuration after a software update. The
>> >> production-mode section [3] provides information on how to do that.
>> >>
>> >> best
>> >> Rupert
>> >>
>> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
>> >> [2] http://svn.apache.org/r1576623
>> >> [3] http://stanbol.apache.org/docs/trunk/production-mode
>> >>
>> >> > ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web [153]:
>> Error
>> >> > starting
>> >> >
>> >>
>>  
>> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
>> >> > tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
>> >> > (org.osgi
>> >> > .framework.BundleException: Unresolved constraint in bundle
>> >> > org.apache.stanbol.e
>> >> > nhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
>> >> > requirement [15
>> >> > 3.0] package; (&(package=javax.ws.rs
>> >> )(version>=0.0.0)(!(version>=2.0.0))))
>> >> > org.osgi.framework.BundleException: Unresolved constraint in bundle
>> >> > org.apache.s
>> >> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve 153.0:
>> missing
>> >> > require
>> >> > ment [153.0] package; (&(package=javax.ws.rs
>> >> > )(version>=0.0.0)(!(version>=2.0.0))
>> >> > )
>> >> >         at
>> >> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>> >> >         at
>> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>> >> >         at
>> >> > org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
>> >> >
>> >> >         at
>> >> > org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
>> >> > )
>> >> >         at java.lang.Thread.run(Unknown Source)
>> >> >
>> >> > Despite of this the server starts fine and I can use the enhancer
>> fine.
>> >> Do
>> >> > you guys see this as well?
>> >> >
>> >> >
>> >> > 2. Whenever I restart the server the Weighted Chains get messed up. I
>> >> > usually use the 'default' chain and add my engine to it so there are
>> 11
>> >> > engines in it. After the restart this chain now contains around 23
>> >> engines
>> >> > in total.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
>> >> [email protected]
>> >> >>:
>> >> >
>> >> >> Hi Cristian,
>> >> >>
>> >> >> NER Annotations are typically available as both
>> >> >> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation [1] in the
>> >> >> enhancement metadata. As you are already accessing the AnayzedText I
>> >> >> would prefer using the  NlpAnnotations.NER_ANNOTATION.
>> >> >>
>> >> >> best
>> >> >> Rupert
>> >> >>
>> >> >> [1]
>> >> >>
>> >>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>> >> >>
>> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
>> >> >> <[email protected]> wrote:
>> >> >> > Thanks.
>> >> >> > I assume I should get the Named entities using the same but with
>> >> >> > NlpAnnotations.NER_ANNOTATION?
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
>> >> >> > [email protected]>:
>> >> >> >
>> >> >> >> Hallo Cristian,
>> >> >> >>
>> >> >> >> NounPhrases are not added to the RDF enhancement results. You
>> need to
>> >> >> >> use the AnalyzedText ContentPart [1]
>> >> >> >>
>> >> >> >> here is some demo code you can use in the computeEnhancement
>> method
>> >> >> >>
>> >> >> >>         AnalysedText at = NlpEngineHelper.getAnalysedText(this,
>> ci,
>> >> >> true);
>> >> >> >>         Iterator<? extends Section> sections = at.getSentences();
>> >> >> >>         if(!sections.hasNext()){ //process as single sentence
>> >> >> >>             sections = Collections.singleton(at).iterator();
>> >> >> >>         }
>> >> >> >>
>> >> >> >>         while(sections.hasNext()){
>> >> >> >>             Section section = sections.next();
>> >> >> >>             Iterator<Span> chunks =
>> >> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>> >> >> >>             while(chunks.hasNext()){
>> >> >> >>                 Span chunk = chunks.next();
>> >> >> >>                 Value<PhraseTag> phrase =
>> >> >> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>> >> >> >>                 if(phrase.value().getCategory() ==
>> >> >> LexicalCategory.Noun){
>> >> >> >>                     log.info(" - NounPhrase [{},{}] {}", new
>> >> Object[]{
>> >> >> >>
>> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>> >> >> >>                 }
>> >> >> >>             }
>> >> >> >>         }
>> >> >> >>
>> >> >> >> hope this helps
>> >> >> >>
>> >> >> >> best
>> >> >> >> Rupert
>> >> >> >>
>> >> >> >> [1]
>> >> >> >>
>> >> >>
>> >>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>> >> >> >>
>> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
>> >> >> >> <[email protected]> wrote:
>> >> >> >> > I started to implement the engine and I'm having problems with
>> >> getting
>> >> >> >> > results for noun phrases. I modified the "default" weighted
>> chain
>> >> to
>> >> >> also
>> >> >> >> > include the PosChunkerEngine and ran a sample text : "Angela
>> Merkel
>> >> >> >> visted
>> >> >> >> > China. The german chancellor met with various people". I
>> expected
>> >> that
>> >> >> >> the
>> >> >> >> > RDF XML output would contain some info about the noun phrases
>> but I
>> >> >> >> cannot
>> >> >> >> > see any.
>> >> >> >> > Could you point me to the correct way to generate the noun
>> phrases?
>> >> >> >> >
>> >> >> >> > Thanks,
>> >> >> >> > Cristian
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
>> >> >> >> [email protected]>:
>> >> >> >> >
>> >> >> >> >> Opened https://issues.apache.org/jira/browse/STANBOL-1279
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
>> >> >> >> [email protected]>
>> >> >> >> >> :
>> >> >> >> >>
>> >> >> >> >> Hi Rupert,
>> >> >> >> >>>
>> >> >> >> >>> The "spatial" dimension is a good idea. I'll also take a look
>> at
>> >> >> Yago.
>> >> >> >> >>>
>> >> >> >> >>> I will create a Jira with what we talked about here. It will
>> >> >> probably
>> >> >> >> >>> have just a draft-like description for now and will be updated
>> >> as I
>> >> >> go
>> >> >> >> >>> along.
>> >> >> >> >>>
>> >> >> >> >>> Thanks,
>> >> >> >> >>> Cristian
>> >> >> >> >>>
>> >> >> >> >>>
>> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
>> >> >> >> >>> [email protected]>:
>> >> >> >> >>>
>> >> >> >> >>> Hi Cristian,
>> >> >> >> >>>>
>> >> >> >> >>>> definitely an interesting approach. You should have a look at
>> >> Yago2
>> >> >> >> >>>> [1]. As far as I can remember the Yago taxonomy is much
>> better
>> >> >> >> >>>> structured as the one used by dbpedia. Mapping suggestions of
>> >> >> dbpedia
>> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and yago2 do
>> >> provide
>> >> >> >> >>>> mappings [2] and [3]
>> >> >> >> >>>>
>> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <[email protected]>:
>> >> >> >> >>>> >>
>> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's company
>> >> made
>> >> >> a
>> >> >> >> >>>> >> huge profit".
>> >> >> >> >>>>
>> >> >> >> >>>> Thats actually a very good example. Spatial contexts are very
>> >> >> >> >>>> important as they tend to be often used for referencing. So I
>> >> would
>> >> >> >> >>>> suggest to specially treat the spatial context. For spatial
>> >> >> Entities
>> >> >> >> >>>> (like a City) this is easy, but even for other (like a
>> Person,
>> >> >> >> >>>> Company) you could use relations to spatial entities define
>> >> their
>> >> >> >> >>>> spatial context. This context could than be used to correctly
>> >> link
>> >> >> >> >>>> "The Redmond's company" to "Microsoft".
>> >> >> >> >>>>
>> >> >> >> >>>> In addition I would suggest to use the "spatial" context of
>> each
>> >> >> >> >>>> entity (basically relation to entities that are cities,
>> regions,
>> >> >> >> >>>> countries) as a separate dimension, because those are very
>> often
>> >> >> used
>> >> >> >> >>>> for coreferences.
>> >> >> >> >>>>
>> >> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
>> >> >> >> >>>> [2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>> >> >> >> >>>> [3]
>> >> >> >> >>>>
>> >> >> >>
>> >> >>
>> >>
>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>> >> >> >> >>>>
>> >> >> >> >>>>
>> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
>> >> >> >> >>>> <[email protected]> wrote:
>> >> >> >> >>>> > There are several dbpedia categories for each entity, in
>> this
>> >> >> case
>> >> >> >> for
>> >> >> >> >>>> > Microsoft we have :
>> >> >> >> >>>> >
>> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
>> >> >> >> >>>> > category:Microsoft
>> >> >> >> >>>> > category:Software_companies_of_the_United_States
>> >> >> >> >>>> > category:Software_companies_based_in_Washington_(state)
>> >> >> >> >>>> > category:Companies_established_in_1975
>> >> >> >> >>>> > category:1975_establishments_in_the_United_States
>> >> >> >> >>>> > category:Companies_based_in_Redmond,_Washington
>> >> >> >> >>>> >
>> >> >> category:Multinational_companies_headquartered_in_the_United_States
>> >> >> >> >>>> > category:Cloud_computing_providers
>> >> >> >> >>>> > category:Companies_in_the_Dow_Jones_Industrial_Average
>> >> >> >> >>>> >
>> >> >> >> >>>> > So we also have "Companies based in Redmont,Washington"
>> which
>> >> >> could
>> >> >> >> be
>> >> >> >> >>>> > matched.
>> >> >> >> >>>> >
>> >> >> >> >>>> >
>> >> >> >> >>>> > There is still other contextual information from dbpedia
>> which
>> >> >> can
>> >> >> >> be
>> >> >> >> >>>> used.
>> >> >> >> >>>> > For example for an Organization we could also include :
>> >> >> >> >>>> > dbpprop:industry = Software
>> >> >> >> >>>> > dbpprop:service = Online Service Providers
>> >> >> >> >>>> >
>> >> >> >> >>>> > and for a Person (that's for Barack Obama) :
>> >> >> >> >>>> >
>> >> >> >> >>>> > dbpedia-owl:profession:
>> >> >> >> >>>> >                                dbpedia:Author
>> >> >> >> >>>> >                                dbpedia:Constitutional_law
>> >> >> >> >>>> >                                dbpedia:Lawyer
>> >> >> >> >>>> >                                dbpedia:Community_organizing
>> >> >> >> >>>> >
>> >> >> >> >>>> > I'd like to continue investigating this as I think that it
>> may
>> >> >> have
>> >> >> >> >>>> some
>> >> >> >> >>>> > value in increasing the number of coreference resolutions
>> and
>> >> I'd
>> >> >> >> like
>> >> >> >> >>>> to
>> >> >> >> >>>> > concentrate more on precision rather than recall since we
>> >> already
>> >> >> >> have
>> >> >> >> >>>> a
>> >> >> >> >>>> > set of coreferences detected by the stanford nlp tool and
>> this
>> >> >> would
>> >> >> >> >>>> be as
>> >> >> >> >>>> > an addition to that (at least this is how I would like to
>> use
>> >> >> it).
>> >> >> >> >>>> >
>> >> >> >> >>>> > Is it ok if I track this by opening a jira? I could update
>> it
>> >> to
>> >> >> >> show
>> >> >> >> >>>> my
>> >> >> >> >>>> > progress and also my conclusions and if it turns out that
>> it
>> >> was
>> >> >> a
>> >> >> >> bad
>> >> >> >> >>>> idea
>> >> >> >> >>>> > then that's the situation at least I'll end up with more
>> >> >> knowledge
>> >> >> >> >>>> about
>> >> >> >> >>>> > Stanbol in the end :).
>> >> >> >> >>>> >
>> >> >> >> >>>> >
>> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <[email protected]>:
>> >> >> >> >>>> >
>> >> >> >> >>>> >> Hi Cristian,
>> >> >> >> >>>> >>
>> >> >> >> >>>> >> The approach sounds nice. I don't want to be the devil's
>> >> >> advocate
>> >> >> >> but
>> >> >> >> >>>> I'm
>> >> >> >> >>>> >> just not sure about the recall using the dbpedia
>> categories
>> >> >> >> feature.
>> >> >> >> >>>> For
>> >> >> >> >>>> >> example, your sentence could be also "Microsoft posted its
>> >> 2013
>> >> >> >> >>>> earnings.
>> >> >> >> >>>> >> The Redmond's company made a huge profit". So, maybe
>> >> including
>> >> >> more
>> >> >> >> >>>> >> contextual information from dbpedia could increase the
>> recall
>> >> >> but
>> >> >> >> of
>> >> >> >> >>>> course
>> >> >> >> >>>> >> will reduce the precision.
>> >> >> >> >>>> >>
>> >> >> >> >>>> >> Cheers,
>> >> >> >> >>>> >> Rafa
>> >> >> >> >>>> >>
>> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
>> >> >> >> >>>> >>
>> >> >> >> >>>> >>  Back with a more detailed description of the steps for
>> >> making
>> >> >> this
>> >> >> >> >>>> kind of
>> >> >> >> >>>> >>> coreference work.
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> I will be using references to the following text in the
>> >> steps
>> >> >> >> below
>> >> >> >> >>>> in
>> >> >> >> >>>> >>> order to make things clearer : "Microsoft posted its 2013
>> >> >> >> earnings.
>> >> >> >> >>>> The
>> >> >> >> >>>> >>> software company made a huge profit."
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> 1. For every noun phrase in the text which has :
>> >> >> >> >>>> >>>      a. a determinate pos which implies reference to an
>> >> entity
>> >> >> >> local
>> >> >> >> >>>> to
>> >> >> >> >>>> >>> the
>> >> >> >> >>>> >>> text, such as "the, this, these") but not "another,
>> every",
>> >> etc
>> >> >> >> which
>> >> >> >> >>>> >>> implies a reference to an entity outside of the text.
>> >> >> >> >>>> >>>      b. having at least another noun aside from the main
>> >> >> required
>> >> >> >> >>>> noun
>> >> >> >> >>>> >>> which
>> >> >> >> >>>> >>> further describes it. For example I will not count "The
>> >> >> company"
>> >> >> >> as
>> >> >> >> >>>> being
>> >> >> >> >>>> >>> a
>> >> >> >> >>>> >>> legitimate candidate since this could create a lot of
>> false
>> >> >> >> >>>> positives by
>> >> >> >> >>>> >>> considering the double meaning of some words such as "in
>> the
>> >> >> >> company
>> >> >> >> >>>> of
>> >> >> >> >>>> >>> good people".
>> >> >> >> >>>> >>> "The software company" is a good candidate since we also
>> >> have
>> >> >> >> >>>> "software".
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to the contents of
>> the
>> >> >> >> dbpedia
>> >> >> >> >>>> >>> categories of each named entity found prior to the
>> location
>> >> of
>> >> >> the
>> >> >> >> >>>> noun
>> >> >> >> >>>> >>> phrase in the text.
>> >> >> >> >>>> >>> The dbpedia categories are in the following format (for
>> >> >> Microsoft
>> >> >> >> for
>> >> >> >> >>>> >>> example) : "Software companies of the United States".
>> >> >> >> >>>> >>>   So we try to match "software company" with that.
>> >> >> >> >>>> >>> First, as you can see, the main noun in the dbpedia
>> category
>> >> >> has a
>> >> >> >> >>>> plural
>> >> >> >> >>>> >>> form and it's the same for all categories which I saw. I
>> >> don't
>> >> >> >> know
>> >> >> >> >>>> if
>> >> >> >> >>>> >>> there's an easier way to do this but I thought of
>> applying a
>> >> >> >> >>>> lemmatizer on
>> >> >> >> >>>> >>> the category and the noun phrase in order for them to
>> have a
>> >> >> >> common
>> >> >> >> >>>> >>> denominator.This also works if the noun phrase itself
>> has a
>> >> >> plural
>> >> >> >> >>>> form.
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> Second, I'll need to use for comparison only the words in
>> >> the
>> >> >> >> >>>> category
>> >> >> >> >>>> >>> which are themselves nouns and not prepositions or
>> >> determiners
>> >> >> >> such
>> >> >> >> >>>> as "of
>> >> >> >> >>>> >>> the".This means that I need to pos tag the categories
>> >> contents
>> >> >> as
>> >> >> >> >>>> well.
>> >> >> >> >>>> >>> I was thinking of running the pos and lemma on the
>> dbpedia
>> >> >> >> >>>> categories when
>> >> >> >> >>>> >>> building the dbpedia backed entity hub and storing them
>> for
>> >> >> later
>> >> >> >> >>>> use - I
>> >> >> >> >>>> >>> don't know how feasible this is at the moment.
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> After this I can compare each noun in the noun phrase
>> with
>> >> the
>> >> >> >> >>>> equivalent
>> >> >> >> >>>> >>> nouns in the categories and based on the number of
>> matches I
>> >> >> can
>> >> >> >> >>>> create a
>> >> >> >> >>>> >>> confidence level.
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> 3. match the noun of the noun phrase with the rdf:type
>> from
>> >> >> >> dbpedia
>> >> >> >> >>>> of the
>> >> >> >> >>>> >>> named entity. If this matches increase the confidence
>> level.
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> 4. If there are multiple named entities which can match a
>> >> >> certain
>> >> >> >> >>>> noun
>> >> >> >> >>>> >>> phrase then link the noun phrase with the closest named
>> >> entity
>> >> >> >> prior
>> >> >> >> >>>> to it
>> >> >> >> >>>> >>> in the text.
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> What do you think?
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> Cristian
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
>> [email protected]>:
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>>  Hi Rafa,
>> >> >> >> >>>> >>>>
>> >> >> >> >>>> >>>> I don't yet have a concrete heursitic but I'm working on
>> >> it.
>> >> >> I'll
>> >> >> >> >>>> provide
>> >> >> >> >>>> >>>> it here so that you guys can give me a feedback on it.
>> >> >> >> >>>> >>>>
>> >> >> >> >>>> >>>> What are "locality" features?
>> >> >> >> >>>> >>>>
>> >> >> >> >>>> >>>> I looked at Bart and other coref tools such as ArkRef
>> and
>> >> >> >> >>>> CherryPicker
>> >> >> >> >>>> >>>> and
>> >> >> >> >>>> >>>> they don't provide such a coreference.
>> >> >> >> >>>> >>>>
>> >> >> >> >>>> >>>> Cristian
>> >> >> >> >>>> >>>>
>> >> >> >> >>>> >>>>
>> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <[email protected]>:
>> >> >> >> >>>> >>>>
>> >> >> >> >>>> >>>> Hi Cristian,
>> >> >> >> >>>> >>>>
>> >> >> >> >>>> >>>>> Without having more details about your concrete
>> heuristic,
>> >> >> in my
>> >> >> >> >>>> honest
>> >> >> >> >>>> >>>>> opinion, such approach could produce a lot of false
>> >> >> positives. I
>> >> >> >> >>>> don't
>> >> >> >> >>>> >>>>> know
>> >> >> >> >>>> >>>>> if you are planning to use some "locality" features to
>> >> detect
>> >> >> >> such
>> >> >> >> >>>> >>>>> coreferences but you need to take into account that it
>> is
>> >> >> quite
>> >> >> >> >>>> usual
>> >> >> >> >>>> >>>>> that
>> >> >> >> >>>> >>>>> coreferenced mentions can occurs even in different
>> >> >> paragraphs.
>> >> >> >> >>>> Although
>> >> >> >> >>>> >>>>> I'm
>> >> >> >> >>>> >>>>> not an expert in Natural Language Understanding, I
>> would
>> >> say
>> >> >> it
>> >> >> >> is
>> >> >> >> >>>> quite
>> >> >> >> >>>> >>>>> difficult to get decent precision/recall rates for
>> >> >> coreferencing
>> >> >> >> >>>> using
>> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to others tools
>> like
>> >> >> BART
>> >> >> >> (
>> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
>> >> >> >> >>>> >>>>>
>> >> >> >> >>>> >>>>> Cheers,
>> >> >> >> >>>> >>>>> Rafa Haro
>> >> >> >> >>>> >>>>>
>> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
>> >> >> >> >>>> >>>>>
>> >> >> >> >>>> >>>>>   Hi,
>> >> >> >> >>>> >>>>>
>> >> >> >> >>>> >>>>>> One of the necessary steps for implementing the Event
>> >> >> >> extraction
>> >> >> >> >>>> Engine
>> >> >> >> >>>> >>>>>> feature :
>> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is
>> >> >> >> >>>> to
>> >> >> >> >>>> >>>>>> have
>> >> >> >> >>>> >>>>>> coreference resolution in the given text. This is
>> >> provided
>> >> >> now
>> >> >> >> >>>> via the
>> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw this module
>> is
>> >> >> >> performing
>> >> >> >> >>>> >>>>>> mostly
>> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and Mr.
>> >> Obama)
>> >> >> >> >>>> coreference
>> >> >> >> >>>> >>>>>> resolution.
>> >> >> >> >>>> >>>>>>
>> >> >> >> >>>> >>>>>> In order to get more coreferences from the text I
>> though
>> >> of
>> >> >> >> >>>> creating
>> >> >> >> >>>> >>>>>> some
>> >> >> >> >>>> >>>>>> logic that would detect this kind of coreference :
>> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The software
>> company
>> >> just
>> >> >> >> >>>> announced
>> >> >> >> >>>> >>>>>> its
>> >> >> >> >>>> >>>>>> 2013 earnings."
>> >> >> >> >>>> >>>>>> Here "The software company" obviously refers to
>> "Apple".
>> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of Named Entities
>> >> which
>> >> >> are
>> >> >> >> of
>> >> >> >> >>>> the
>> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this case "company"
>> and
>> >> >> also
>> >> >> >> >>>> have
>> >> >> >> >>>> >>>>>> attributes which can be found in the dbpedia
>> categories
>> >> of
>> >> >> the
>> >> >> >> >>>> named
>> >> >> >> >>>> >>>>>> entity, in this case "software".
>> >> >> >> >>>> >>>>>>
>> >> >> >> >>>> >>>>>> The detection of coreferences such as "The software
>> >> >> company" in
>> >> >> >> >>>> the
>> >> >> >> >>>> >>>>>> text
>> >> >> >> >>>> >>>>>> would also be done by either using the new Pos Tag
>> Based
>> >> >> Phrase
>> >> >> >> >>>> >>>>>> extraction
>> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a dependency tree of
>> >> the
>> >> >> >> >>>> sentence and
>> >> >> >> >>>> >>>>>> picking up only subjects or objects.
>> >> >> >> >>>> >>>>>>
>> >> >> >> >>>> >>>>>> At this point I'd like to know if this kind of logic
>> >> would
>> >> >> be
>> >> >> >> >>>> useful
>> >> >> >> >>>> >>>>>> as a
>> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case the precision and
>> >> >> recall
>> >> >> >> are
>> >> >> >> >>>> good
>> >> >> >> >>>> >>>>>> enough) in Stanbol?
>> >> >> >> >>>> >>>>>>
>> >> >> >> >>>> >>>>>> Thanks,
>> >> >> >> >>>> >>>>>> Cristian
>> >> >> >> >>>> >>>>>>
>> >> >> >> >>>> >>>>>>
>> >> >> >> >>>> >>>>>>
>> >> >> >> >>>> >>
>> >> >> >> >>>>
>> >> >> >> >>>>
>> >> >> >> >>>>
>> >> >> >> >>>> --
>> >> >> >> >>>> | Rupert Westenthaler
>> [email protected]
>> >> >> >> >>>> | Bodenlehenstraße 11
>> >> >> ++43-699-11108907
>> >> >> >> >>>> | A-5500 Bischofshofen
>> >> >> >> >>>>
>> >> >> >> >>>
>> >> >> >> >>>
>> >> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> | Rupert Westenthaler             [email protected]
>> >> >> >> | Bodenlehenstraße 11
>> ++43-699-11108907
>> >> >> >> | A-5500 Bischofshofen
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> | Rupert Westenthaler             [email protected]
>> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> >> | A-5500 Bischofshofen
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> | Rupert Westenthaler             [email protected]
>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             [email protected]
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Reply via email to