Re: Named entity coref resolution based on dbpedia categories and rdf:type

Rupert Westenthaler Thu, 20 Mar 2014 02:26:36 -0700

Hi Cristian,

On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca
<[email protected]> wrote:
> stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"]
> service.ranking=I"-2147483648"
> stanbol.enhancer.chain.name="default"


Does look fine to me. Do you see any exception during the startup of
the launcher. Can you check the status of this component in the
component tab of the felix web console [1] (search for
"org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain"). If
you have multiple you can find the correct one by comparing the
"Properties" with those in the configuration file.

I guess that the according service is in the 'unsatisfied' as you do
not see it in the web interface. But if this is the case you should
also see the according exception in the log. You can also manually
stop/start the component. In this case the exception should be
re-thrown and you do not need to search the log for it.

best
Rupert


[1] http://localhost:8080/system/console/components

>
>
>
> 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler <[email protected]
>>:
>
>> Hi Cristian,
>>
>> you can not send attachments to the list. Please copy the contents
>> directly to the mail
>>
>> thx
>> Rupert
>>
>> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca
>> <[email protected]> wrote:
>> > The config attached.
>> >
>> >
>> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler
>> > <[email protected]>:
>> >
>> >> Hi Cristian,
>> >>
>> >> can you provide the contents of the chain after your modifications?
>> >> Would be interesting to test why the chain is no longer active after
>> >> the restart.
>> >>
>> >> You can find the config file in the 'stanbol/fileinstall' folder.
>> >>
>> >> best
>> >> Rupert
>> >>
>> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
>> >> <[email protected]> wrote:
>> >> > Related to the default chain selection rules : before restart I had a
>> >> > chain
>> >> > with the name 'default' as in I could access it via
>> >> > enhancer/chain/default.
>> >> > Then I just added another engine to the 'default' chain. I assumed
>> that
>> >> > after the restart the chain with the 'default' name would be
>> persisted.
>> >> > So
>> >> > the first rule should have been applied after the restart as well. But
>> >> > instead I cannot reach it via enhancer/chain/default anymore so its
>> >> > gone.
>> >> > Anyway, this is not a big deal, it's not blocking me in any way, I
>> just
>> >> > wanted to understand where the problem is.
>> >> >
>> >> >
>> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler
>> >> > <[email protected]
>> >> >>:
>> >> >
>> >> >> Hi Cristian
>> >> >>
>> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
>> >> >> <[email protected]> wrote:
>> >> >> > 1. Updated to the latest code and it's gone. Cool
>> >> >> >
>> >> >> > 2. I start the stable launcher -> create a new instance of the
>> >> >> > PosChunkerEngine -> add it to the default chain. At this point
>> >> >> > everything
>> >> >> > looks good and works ok.
>> >> >> > After I restart the server the default chain is gone and instead I
>> >> >> > see
>> >> >> this
>> >> >> > in the enhancement chains page : all-active (default, id: 149,
>> >> >> > ranking:
>> >> >> 0,
>> >> >> > impl: AllActiveEnginesChain ). all-active did not contain the
>> >> >> > 'default'
>> >> >> > word before the restart.
>> >> >> >
>> >> >>
>> >> >> Please note the default chain selection rules as described at [1].
>> You
>> >> >> can also access chains chains under '/enhancer/chain/{chain-name}'
>> >> >>
>> >> >> best
>> >> >> Rupert
>> >> >>
>> >> >> [1]
>> >> >>
>> >> >>
>> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
>> >> >>
>> >> >> > It looks like the config files are exactly what I need. Thanks.
>> >> >> >
>> >> >> >
>> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
>> >> >> [email protected]
>> >> >> >>:
>> >> >> >
>> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
>> >> >> >> <[email protected]> wrote:
>> >> >> >> > Thanks Rupert.
>> >> >> >> >
>> >> >> >> > A couple more questions/issues :
>> >> >> >> >
>> >> >> >> > 1. Whenever I start the stanbol server I'm seeing this in the
>> >> >> >> > console
>> >> >> >> > output :
>> >> >> >> >
>> >> >> >>
>> >> >> >> This should be fixed with STANBOL-1278 [1] [2]
>> >> >> >>
>> >> >> >> > 2. Whenever I restart the server the Weighted Chains get messed
>> >> >> >> > up. I
>> >> >> >> > usually use the 'default' chain and add my engine to it so there
>> >> >> >> > are
>> >> >> 11
>> >> >> >> > engines in it. After the restart this chain now contains around
>> 23
>> >> >> >> engines
>> >> >> >> > in total.
>> >> >> >>
>> >> >> >> I was not able to replicate this. What I tried was
>> >> >> >>
>> >> >> >> (1) start up the stable launcher
>> >> >> >> (2) add an additional engine to the default chain
>> >> >> >> (3) restart the launcher
>> >> >> >>
>> >> >> >> The default chain was not changed after (2) and (3). So I would
>> need
>> >> >> >> further information for knowing why this is happening.
>> >> >> >>
>> >> >> >> Generally it is better to create you own chain instance as
>> modifying
>> >> >> >> one that is provided by the default configuration. I would also
>> >> >> >> recommend that you keep your test configuration in text files and
>> to
>> >> >> >> copy those to the 'stanbol/fileinstall' folder. Doing so prevent
>> you
>> >> >> >> from manually entering the configuration after a software update.
>> >> >> >> The
>> >> >> >> production-mode section [3] provides information on how to do
>> that.
>> >> >> >>
>> >> >> >> best
>> >> >> >> Rupert
>> >> >> >>
>> >> >> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
>> >> >> >> [2] http://svn.apache.org/r1576623
>> >> >> >> [3] http://stanbol.apache.org/docs/trunk/production-mode
>> >> >> >>
>> >> >> >> > ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web
>> [153]:
>> >> >> Error
>> >> >> >> > starting
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >> >>
>> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
>> >> >> >> >
>> >> >> >> >
>> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
>> >> >> >> > (org.osgi
>> >> >> >> > .framework.BundleException: Unresolved constraint in bundle
>> >> >> >> > org.apache.stanbol.e
>> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
>> >> >> >> > requirement [15
>> >> >> >> > 3.0] package; (&(package=javax.ws.rs
>> >> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
>> >> >> >> > org.osgi.framework.BundleException: Unresolved constraint in
>> >> >> >> > bundle
>> >> >> >> > org.apache.s
>> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve 153.0:
>> >> >> missing
>> >> >> >> > require
>> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs
>> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
>> >> >> >> > )
>> >> >> >> >         at
>> >> >> >> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>> >> >> >> >         at
>> >> >> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>> >> >> >> >         at
>> >> >> >> >
>> >> >> >> >
>> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
>> >> >> >> >
>> >> >> >> >         at
>> >> >> >> >
>> >> >> >> >
>> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
>> >> >> >> > )
>> >> >> >> >         at java.lang.Thread.run(Unknown Source)
>> >> >> >> >
>> >> >> >> > Despite of this the server starts fine and I can use the
>> enhancer
>> >> >> fine.
>> >> >> >> Do
>> >> >> >> > you guys see this as well?
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > 2. Whenever I restart the server the Weighted Chains get messed
>> >> >> >> > up. I
>> >> >> >> > usually use the 'default' chain and add my engine to it so there
>> >> >> >> > are
>> >> >> 11
>> >> >> >> > engines in it. After the restart this chain now contains around
>> 23
>> >> >> >> engines
>> >> >> >> > in total.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
>> >> >> >> [email protected]
>> >> >> >> >>:
>> >> >> >> >
>> >> >> >> >> Hi Cristian,
>> >> >> >> >>
>> >> >> >> >> NER Annotations are typically available as both
>> >> >> >> >> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation [1] in
>> the
>> >> >> >> >> enhancement metadata. As you are already accessing the
>> >> >> >> >> AnayzedText I
>> >> >> >> >> would prefer using the  NlpAnnotations.NER_ANNOTATION.
>> >> >> >> >>
>> >> >> >> >> best
>> >> >> >> >> Rupert
>> >> >> >> >>
>> >> >> >> >> [1]
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >> >>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>> >> >> >> >>
>> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
>> >> >> >> >> <[email protected]> wrote:
>> >> >> >> >> > Thanks.
>> >> >> >> >> > I assume I should get the Named entities using the same but
>> >> >> >> >> > with
>> >> >> >> >> > NlpAnnotations.NER_ANNOTATION?
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
>> >> >> >> >> > [email protected]>:
>> >> >> >> >> >
>> >> >> >> >> >> Hallo Cristian,
>> >> >> >> >> >>
>> >> >> >> >> >> NounPhrases are not added to the RDF enhancement results.
>> You
>> >> >> need to
>> >> >> >> >> >> use the AnalyzedText ContentPart [1]
>> >> >> >> >> >>
>> >> >> >> >> >> here is some demo code you can use in the computeEnhancement
>> >> >> method
>> >> >> >> >> >>
>> >> >> >> >> >>         AnalysedText at =
>> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this,
>> >> >> ci,
>> >> >> >> >> true);
>> >> >> >> >> >>         Iterator<? extends Section> sections =
>> >> >> >> >> >> at.getSentences();
>> >> >> >> >> >>         if(!sections.hasNext()){ //process as single
>> sentence
>> >> >> >> >> >>             sections = Collections.singleton(at).iterator();
>> >> >> >> >> >>         }
>> >> >> >> >> >>
>> >> >> >> >> >>         while(sections.hasNext()){
>> >> >> >> >> >>             Section section = sections.next();
>> >> >> >> >> >>             Iterator<Span> chunks =
>> >> >> >> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>> >> >> >> >> >>             while(chunks.hasNext()){
>> >> >> >> >> >>                 Span chunk = chunks.next();
>> >> >> >> >> >>                 Value<PhraseTag> phrase =
>> >> >> >> >> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>> >> >> >> >> >>                 if(phrase.value().getCategory() ==
>> >> >> >> >> LexicalCategory.Noun){
>> >> >> >> >> >>                     log.info(" - NounPhrase [{},{}] {}",
>> new
>> >> >> >> Object[]{
>> >> >> >> >> >>
>> >> >> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>> >> >> >> >> >>                 }
>> >> >> >> >> >>             }
>> >> >> >> >> >>         }
>> >> >> >> >> >>
>> >> >> >> >> >> hope this helps
>> >> >> >> >> >>
>> >> >> >> >> >> best
>> >> >> >> >> >> Rupert
>> >> >> >> >> >>
>> >> >> >> >> >> [1]
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >> >>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>> >> >> >> >> >>
>> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
>> >> >> >> >> >> <[email protected]> wrote:
>> >> >> >> >> >> > I started to implement the engine and I'm having problems
>> >> >> >> >> >> > with
>> >> >> >> getting
>> >> >> >> >> >> > results for noun phrases. I modified the "default"
>> weighted
>> >> >> chain
>> >> >> >> to
>> >> >> >> >> also
>> >> >> >> >> >> > include the PosChunkerEngine and ran a sample text :
>> "Angela
>> >> >> Merkel
>> >> >> >> >> >> visted
>> >> >> >> >> >> > China. The german chancellor met with various people". I
>> >> >> expected
>> >> >> >> that
>> >> >> >> >> >> the
>> >> >> >> >> >> > RDF XML output would contain some info about the noun
>> >> >> >> >> >> > phrases
>> >> >> but I
>> >> >> >> >> >> cannot
>> >> >> >> >> >> > see any.
>> >> >> >> >> >> > Could you point me to the correct way to generate the noun
>> >> >> phrases?
>> >> >> >> >> >> >
>> >> >> >> >> >> > Thanks,
>> >> >> >> >> >> > Cristian
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
>> >> >> >> >> >> [email protected]>:
>> >> >> >> >> >> >
>> >> >> >> >> >> >> Opened
>> https://issues.apache.org/jira/browse/STANBOL-1279
>> >> >> >> >> >> >>
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
>> >> >> >> >> >> [email protected]>
>> >> >> >> >> >> >> :
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> Hi Rupert,
>> >> >> >> >> >> >>>
>> >> >> >> >> >> >>> The "spatial" dimension is a good idea. I'll also take a
>> >> >> >> >> >> >>> look
>> >> >> at
>> >> >> >> >> Yago.
>> >> >> >> >> >> >>>
>> >> >> >> >> >> >>> I will create a Jira with what we talked about here. It
>> >> >> >> >> >> >>> will
>> >> >> >> >> probably
>> >> >> >> >> >> >>> have just a draft-like description for now and will be
>> >> >> >> >> >> >>> updated
>> >> >> >> as I
>> >> >> >> >> go
>> >> >> >> >> >> >>> along.
>> >> >> >> >> >> >>>
>> >> >> >> >> >> >>> Thanks,
>> >> >> >> >> >> >>> Cristian
>> >> >> >> >> >> >>>
>> >> >> >> >> >> >>>
>> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
>> >> >> >> >> >> >>> [email protected]>:
>> >> >> >> >> >> >>>
>> >> >> >> >> >> >>> Hi Cristian,
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>> definitely an interesting approach. You should have a
>> >> >> >> >> >> >>>> look at
>> >> >> >> Yago2
>> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago taxonomy is much
>> >> >> better
>> >> >> >> >> >> >>>> structured as the one used by dbpedia. Mapping
>> >> >> >> >> >> >>>> suggestions of
>> >> >> >> >> dbpedia
>> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and yago2
>> do
>> >> >> >> provide
>> >> >> >> >> >> >>>> mappings [2] and [3]
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>> >> >> >> >> >> >>>> > <[email protected]>:
>> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's
>> >> >> >> >> >> >>>> >> company
>> >> >> >> made
>> >> >> >> >> a
>> >> >> >> >> >> >>>> >> huge profit".
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>> Thats actually a very good example. Spatial contexts
>> are
>> >> >> >> >> >> >>>> very
>> >> >> >> >> >> >>>> important as they tend to be often used for
>> referencing.
>> >> >> >> >> >> >>>> So I
>> >> >> >> would
>> >> >> >> >> >> >>>> suggest to specially treat the spatial context. For
>> >> >> >> >> >> >>>> spatial
>> >> >> >> >> Entities
>> >> >> >> >> >> >>>> (like a City) this is easy, but even for other (like a
>> >> >> Person,
>> >> >> >> >> >> >>>> Company) you could use relations to spatial entities
>> >> >> >> >> >> >>>> define
>> >> >> >> their
>> >> >> >> >> >> >>>> spatial context. This context could than be used to
>> >> >> >> >> >> >>>> correctly
>> >> >> >> link
>> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>> In addition I would suggest to use the "spatial"
>> context
>> >> >> >> >> >> >>>> of
>> >> >> each
>> >> >> >> >> >> >>>> entity (basically relation to entities that are cities,
>> >> >> regions,
>> >> >> >> >> >> >>>> countries) as a separate dimension, because those are
>> >> >> >> >> >> >>>> very
>> >> >> often
>> >> >> >> >> used
>> >> >> >> >> >> >>>> for coreferences.
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
>> >> >> >> >> >> >>>> [2]
>> >> >> >> >> >> >>>>
>> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>> >> >> >> >> >> >>>> [3]
>> >> >> >> >> >> >>>>
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >> >>
>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
>> >> >> >> >> >> >>>> <[email protected]> wrote:
>> >> >> >> >> >> >>>> > There are several dbpedia categories for each entity,
>> >> >> >> >> >> >>>> > in
>> >> >> this
>> >> >> >> >> case
>> >> >> >> >> >> for
>> >> >> >> >> >> >>>> > Microsoft we have :
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
>> >> >> >> >> >> >>>> > category:Microsoft
>> >> >> >> >> >> >>>> > category:Software_companies_of_the_United_States
>> >> >> >> >> >> >>>> >
>> category:Software_companies_based_in_Washington_(state)
>> >> >> >> >> >> >>>> > category:Companies_established_in_1975
>> >> >> >> >> >> >>>> > category:1975_establishments_in_the_United_States
>> >> >> >> >> >> >>>> > category:Companies_based_in_Redmond,_Washington
>> >> >> >> >> >> >>>> >
>> >> >> >> >>
>> >> >> >> >>
>> category:Multinational_companies_headquartered_in_the_United_States
>> >> >> >> >> >> >>>> > category:Cloud_computing_providers
>> >> >> >> >> >> >>>> >
>> category:Companies_in_the_Dow_Jones_Industrial_Average
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > So we also have "Companies based in
>> Redmont,Washington"
>> >> >> which
>> >> >> >> >> could
>> >> >> >> >> >> be
>> >> >> >> >> >> >>>> > matched.
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > There is still other contextual information from
>> >> >> >> >> >> >>>> > dbpedia
>> >> >> which
>> >> >> >> >> can
>> >> >> >> >> >> be
>> >> >> >> >> >> >>>> used.
>> >> >> >> >> >> >>>> > For example for an Organization we could also
>> include :
>> >> >> >> >> >> >>>> > dbpprop:industry = Software
>> >> >> >> >> >> >>>> > dbpprop:service = Online Service Providers
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > and for a Person (that's for Barack Obama) :
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > dbpedia-owl:profession:
>> >> >> >> >> >> >>>> >                                dbpedia:Author
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > dbpedia:Constitutional_law
>> >> >> >> >> >> >>>> >                                dbpedia:Lawyer
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > dbpedia:Community_organizing
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > I'd like to continue investigating this as I think
>> that
>> >> >> >> >> >> >>>> > it
>> >> >> may
>> >> >> >> >> have
>> >> >> >> >> >> >>>> some
>> >> >> >> >> >> >>>> > value in increasing the number of coreference
>> >> >> >> >> >> >>>> > resolutions
>> >> >> and
>> >> >> >> I'd
>> >> >> >> >> >> like
>> >> >> >> >> >> >>>> to
>> >> >> >> >> >> >>>> > concentrate more on precision rather than recall
>> since
>> >> >> >> >> >> >>>> > we
>> >> >> >> already
>> >> >> >> >> >> have
>> >> >> >> >> >> >>>> a
>> >> >> >> >> >> >>>> > set of coreferences detected by the stanford nlp tool
>> >> >> >> >> >> >>>> > and
>> >> >> this
>> >> >> >> >> would
>> >> >> >> >> >> >>>> be as
>> >> >> >> >> >> >>>> > an addition to that (at least this is how I would
>> like
>> >> >> >> >> >> >>>> > to
>> >> >> use
>> >> >> >> >> it).
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > Is it ok if I track this by opening a jira? I could
>> >> >> >> >> >> >>>> > update
>> >> >> it
>> >> >> >> to
>> >> >> >> >> >> show
>> >> >> >> >> >> >>>> my
>> >> >> >> >> >> >>>> > progress and also my conclusions and if it turns out
>> >> >> >> >> >> >>>> > that
>> >> >> it
>> >> >> >> was
>> >> >> >> >> a
>> >> >> >> >> >> bad
>> >> >> >> >> >> >>>> idea
>> >> >> >> >> >> >>>> > then that's the situation at least I'll end up with
>> >> >> >> >> >> >>>> > more
>> >> >> >> >> knowledge
>> >> >> >> >> >> >>>> about
>> >> >> >> >> >> >>>> > Stanbol in the end :).
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>> >> >> >> >> >> >>>> > <[email protected]>:
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> >> Hi Cristian,
>> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want to be the
>> >> >> >> >> >> >>>> >> devil's
>> >> >> >> >> advocate
>> >> >> >> >> >> but
>> >> >> >> >> >> >>>> I'm
>> >> >> >> >> >> >>>> >> just not sure about the recall using the dbpedia
>> >> >> categories
>> >> >> >> >> >> feature.
>> >> >> >> >> >> >>>> For
>> >> >> >> >> >> >>>> >> example, your sentence could be also "Microsoft
>> posted
>> >> >> >> >> >> >>>> >> its
>> >> >> >> 2013
>> >> >> >> >> >> >>>> earnings.
>> >> >> >> >> >> >>>> >> The Redmond's company made a huge profit". So, maybe
>> >> >> >> including
>> >> >> >> >> more
>> >> >> >> >> >> >>>> >> contextual information from dbpedia could increase
>> the
>> >> >> recall
>> >> >> >> >> but
>> >> >> >> >> >> of
>> >> >> >> >> >> >>>> course
>> >> >> >> >> >> >>>> >> will reduce the precision.
>> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >>>> >> Cheers,
>> >> >> >> >> >> >>>> >> Rafa
>> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
>> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >>>> >>  Back with a more detailed description of the steps
>> >> >> >> >> >> >>>> >> for
>> >> >> >> making
>> >> >> >> >> this
>> >> >> >> >> >> >>>> kind of
>> >> >> >> >> >> >>>> >>> coreference work.
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> I will be using references to the following text in
>> >> >> >> >> >> >>>> >>> the
>> >> >> >> steps
>> >> >> >> >> >> below
>> >> >> >> >> >> >>>> in
>> >> >> >> >> >> >>>> >>> order to make things clearer : "Microsoft posted
>> its
>> >> >> >> >> >> >>>> >>> 2013
>> >> >> >> >> >> earnings.
>> >> >> >> >> >> >>>> The
>> >> >> >> >> >> >>>> >>> software company made a huge profit."
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text which has :
>> >> >> >> >> >> >>>> >>>      a. a determinate pos which implies reference
>> to
>> >> >> >> >> >> >>>> >>> an
>> >> >> >> entity
>> >> >> >> >> >> local
>> >> >> >> >> >> >>>> to
>> >> >> >> >> >> >>>> >>> the
>> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but not "another,
>> >> >> every",
>> >> >> >> etc
>> >> >> >> >> >> which
>> >> >> >> >> >> >>>> >>> implies a reference to an entity outside of the
>> text.
>> >> >> >> >> >> >>>> >>>      b. having at least another noun aside from the
>> >> >> >> >> >> >>>> >>> main
>> >> >> >> >> required
>> >> >> >> >> >> >>>> noun
>> >> >> >> >> >> >>>> >>> which
>> >> >> >> >> >> >>>> >>> further describes it. For example I will not count
>> >> >> >> >> >> >>>> >>> "The
>> >> >> >> >> company"
>> >> >> >> >> >> as
>> >> >> >> >> >> >>>> being
>> >> >> >> >> >> >>>> >>> a
>> >> >> >> >> >> >>>> >>> legitimate candidate since this could create a lot
>> of
>> >> >> false
>> >> >> >> >> >> >>>> positives by
>> >> >> >> >> >> >>>> >>> considering the double meaning of some words such
>> as
>> >> >> >> >> >> >>>> >>> "in
>> >> >> the
>> >> >> >> >> >> company
>> >> >> >> >> >> >>>> of
>> >> >> >> >> >> >>>> >>> good people".
>> >> >> >> >> >> >>>> >>> "The software company" is a good candidate since we
>> >> >> >> >> >> >>>> >>> also
>> >> >> >> have
>> >> >> >> >> >> >>>> "software".
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to the
>> contents
>> >> >> >> >> >> >>>> >>> of
>> >> >> the
>> >> >> >> >> >> dbpedia
>> >> >> >> >> >> >>>> >>> categories of each named entity found prior to the
>> >> >> location
>> >> >> >> of
>> >> >> >> >> the
>> >> >> >> >> >> >>>> noun
>> >> >> >> >> >> >>>> >>> phrase in the text.
>> >> >> >> >> >> >>>> >>> The dbpedia categories are in the following format
>> >> >> >> >> >> >>>> >>> (for
>> >> >> >> >> Microsoft
>> >> >> >> >> >> for
>> >> >> >> >> >> >>>> >>> example) : "Software companies of the United
>> States".
>> >> >> >> >> >> >>>> >>>   So we try to match "software company" with that.
>> >> >> >> >> >> >>>> >>> First, as you can see, the main noun in the dbpedia
>> >> >> category
>> >> >> >> >> has a
>> >> >> >> >> >> >>>> plural
>> >> >> >> >> >> >>>> >>> form and it's the same for all categories which I
>> >> >> >> >> >> >>>> >>> saw. I
>> >> >> >> don't
>> >> >> >> >> >> know
>> >> >> >> >> >> >>>> if
>> >> >> >> >> >> >>>> >>> there's an easier way to do this but I thought of
>> >> >> applying a
>> >> >> >> >> >> >>>> lemmatizer on
>> >> >> >> >> >> >>>> >>> the category and the noun phrase in order for them
>> to
>> >> >> have a
>> >> >> >> >> >> common
>> >> >> >> >> >> >>>> >>> denominator.This also works if the noun phrase
>> itself
>> >> >> has a
>> >> >> >> >> plural
>> >> >> >> >> >> >>>> form.
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> Second, I'll need to use for comparison only the
>> >> >> >> >> >> >>>> >>> words in
>> >> >> >> the
>> >> >> >> >> >> >>>> category
>> >> >> >> >> >> >>>> >>> which are themselves nouns and not prepositions or
>> >> >> >> determiners
>> >> >> >> >> >> such
>> >> >> >> >> >> >>>> as "of
>> >> >> >> >> >> >>>> >>> the".This means that I need to pos tag the
>> categories
>> >> >> >> contents
>> >> >> >> >> as
>> >> >> >> >> >> >>>> well.
>> >> >> >> >> >> >>>> >>> I was thinking of running the pos and lemma on the
>> >> >> dbpedia
>> >> >> >> >> >> >>>> categories when
>> >> >> >> >> >> >>>> >>> building the dbpedia backed entity hub and storing
>> >> >> >> >> >> >>>> >>> them
>> >> >> for
>> >> >> >> >> later
>> >> >> >> >> >> >>>> use - I
>> >> >> >> >> >> >>>> >>> don't know how feasible this is at the moment.
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> After this I can compare each noun in the noun
>> phrase
>> >> >> with
>> >> >> >> the
>> >> >> >> >> >> >>>> equivalent
>> >> >> >> >> >> >>>> >>> nouns in the categories and based on the number of
>> >> >> matches I
>> >> >> >> >> can
>> >> >> >> >> >> >>>> create a
>> >> >> >> >> >> >>>> >>> confidence level.
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase with the
>> >> >> >> >> >> >>>> >>> rdf:type
>> >> >> from
>> >> >> >> >> >> dbpedia
>> >> >> >> >> >> >>>> of the
>> >> >> >> >> >> >>>> >>> named entity. If this matches increase the
>> confidence
>> >> >> level.
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> 4. If there are multiple named entities which can
>> >> >> >> >> >> >>>> >>> match a
>> >> >> >> >> certain
>> >> >> >> >> >> >>>> noun
>> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with the closest
>> >> >> >> >> >> >>>> >>> named
>> >> >> >> entity
>> >> >> >> >> >> prior
>> >> >> >> >> >> >>>> to it
>> >> >> >> >> >> >>>> >>> in the text.
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> What do you think?
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> Cristian
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
>> >> >> [email protected]>:
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>>  Hi Rafa,
>> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic but I'm
>> >> >> >> >> >> >>>> >>>> working on
>> >> >> >> it.
>> >> >> >> >> I'll
>> >> >> >> >> >> >>>> provide
>> >> >> >> >> >> >>>> >>>> it here so that you guys can give me a feedback on
>> >> >> >> >> >> >>>> >>>> it.
>> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >>>> >>>> What are "locality" features?
>> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools such as
>> >> >> >> >> >> >>>> >>>> ArkRef
>> >> >> and
>> >> >> >> >> >> >>>> CherryPicker
>> >> >> >> >> >> >>>> >>>> and
>> >> >> >> >> >> >>>> >>>> they don't provide such a coreference.
>> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >>>> >>>> Cristian
>> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <[email protected]>:
>> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >>>> >>>> Hi Cristian,
>> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >>>> >>>>> Without having more details about your concrete
>> >> >> heuristic,
>> >> >> >> >> in my
>> >> >> >> >> >> >>>> honest
>> >> >> >> >> >> >>>> >>>>> opinion, such approach could produce a lot of
>> false
>> >> >> >> >> positives. I
>> >> >> >> >> >> >>>> don't
>> >> >> >> >> >> >>>> >>>>> know
>> >> >> >> >> >> >>>> >>>>> if you are planning to use some "locality"
>> features
>> >> >> >> >> >> >>>> >>>>> to
>> >> >> >> detect
>> >> >> >> >> >> such
>> >> >> >> >> >> >>>> >>>>> coreferences but you need to take into account
>> that
>> >> >> >> >> >> >>>> >>>>> it
>> >> >> is
>> >> >> >> >> quite
>> >> >> >> >> >> >>>> usual
>> >> >> >> >> >> >>>> >>>>> that
>> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs even in
>> different
>> >> >> >> >> paragraphs.
>> >> >> >> >> >> >>>> Although
>> >> >> >> >> >> >>>> >>>>> I'm
>> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language Understanding,
>> I
>> >> >> would
>> >> >> >> say
>> >> >> >> >> it
>> >> >> >> >> >> is
>> >> >> >> >> >> >>>> quite
>> >> >> >> >> >> >>>> >>>>> difficult to get decent precision/recall rates
>> for
>> >> >> >> >> coreferencing
>> >> >> >> >> >> >>>> using
>> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to others
>> >> >> >> >> >> >>>> >>>>> tools
>> >> >> like
>> >> >> >> >> BART
>> >> >> >> >> >> (
>> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
>> >> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >> >>>> >>>>> Cheers,
>> >> >> >> >> >> >>>> >>>>> Rafa Haro
>> >> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
>> >> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >> >>>> >>>>>   Hi,
>> >> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for implementing the
>> >> >> >> >> >> >>>> >>>>>> Event
>> >> >> >> >> >> extraction
>> >> >> >> >> >> >>>> Engine
>> >> >> >> >> >> >>>> >>>>>> feature :
>> >> >> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is
>> >> >> >> >> >> >>>> to
>> >> >> >> >> >> >>>> >>>>>> have
>> >> >> >> >> >> >>>> >>>>>> coreference resolution in the given text. This
>> is
>> >> >> >> provided
>> >> >> >> >> now
>> >> >> >> >> >> >>>> via the
>> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw this
>> >> >> >> >> >> >>>> >>>>>> module
>> >> >> is
>> >> >> >> >> >> performing
>> >> >> >> >> >> >>>> >>>>>> mostly
>> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and
>> >> >> >> >> >> >>>> >>>>>> Mr.
>> >> >> >> Obama)
>> >> >> >> >> >> >>>> coreference
>> >> >> >> >> >> >>>> >>>>>> resolution.
>> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences from the text
>> I
>> >> >> though
>> >> >> >> of
>> >> >> >> >> >> >>>> creating
>> >> >> >> >> >> >>>> >>>>>> some
>> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind of
>> coreference :
>> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The software
>> >> >> company
>> >> >> >> just
>> >> >> >> >> >> >>>> announced
>> >> >> >> >> >> >>>> >>>>>> its
>> >> >> >> >> >> >>>> >>>>>> 2013 earnings."
>> >> >> >> >> >> >>>> >>>>>> Here "The software company" obviously refers to
>> >> >> "Apple".
>> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of Named
>> >> >> >> >> >> >>>> >>>>>> Entities
>> >> >> >> which
>> >> >> >> >> are
>> >> >> >> >> >> of
>> >> >> >> >> >> >>>> the
>> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this case
>> >> >> >> >> >> >>>> >>>>>> "company"
>> >> >> and
>> >> >> >> >> also
>> >> >> >> >> >> >>>> have
>> >> >> >> >> >> >>>> >>>>>> attributes which can be found in the dbpedia
>> >> >> categories
>> >> >> >> of
>> >> >> >> >> the
>> >> >> >> >> >> >>>> named
>> >> >> >> >> >> >>>> >>>>>> entity, in this case "software".
>> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such as "The
>> >> >> >> >> >> >>>> >>>>>> software
>> >> >> >> >> company" in
>> >> >> >> >> >> >>>> the
>> >> >> >> >> >> >>>> >>>>>> text
>> >> >> >> >> >> >>>> >>>>>> would also be done by either using the new Pos
>> Tag
>> >> >> Based
>> >> >> >> >> Phrase
>> >> >> >> >> >> >>>> >>>>>> extraction
>> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a dependency
>> >> >> >> >> >> >>>> >>>>>> tree of
>> >> >> >> the
>> >> >> >> >> >> >>>> sentence and
>> >> >> >> >> >> >>>> >>>>>> picking up only subjects or objects.
>> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if this kind of
>> >> >> >> >> >> >>>> >>>>>> logic
>> >> >> >> would
>> >> >> >> >> be
>> >> >> >> >> >> >>>> useful
>> >> >> >> >> >> >>>> >>>>>> as a
>> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case the
>> precision
>> >> >> >> >> >> >>>> >>>>>> and
>> >> >> >> >> recall
>> >> >> >> >> >> are
>> >> >> >> >> >> >>>> good
>> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
>> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >>>> >>>>>> Thanks,
>> >> >> >> >> >> >>>> >>>>>> Cristian
>> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>> --
>> >> >> >> >> >> >>>> | Rupert Westenthaler
>> >> >> [email protected]
>> >> >> >> >> >> >>>> | Bodenlehenstraße 11
>> >> >> >> >> ++43-699-11108907
>> >> >> >> >> >> >>>> | A-5500 Bischofshofen
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>
>> >> >> >> >> >> >>>
>> >> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> --
>> >> >> >> >> >> | Rupert Westenthaler
>> >> >> >> >> >> [email protected]
>> >> >> >> >> >> | Bodenlehenstraße 11
>> >> >> ++43-699-11108907
>> >> >> >> >> >> | A-5500 Bischofshofen
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >> | Rupert Westenthaler
>> [email protected]
>> >> >> >> >> | Bodenlehenstraße 11
>> >> >> >> >> ++43-699-11108907
>> >> >> >> >> | A-5500 Bischofshofen
>> >> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> | Rupert Westenthaler             [email protected]
>> >> >> >> | Bodenlehenstraße 11
>> ++43-699-11108907
>> >> >> >> | A-5500 Bischofshofen
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> | Rupert Westenthaler             [email protected]
>> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> >> | A-5500 Bischofshofen
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> | Rupert Westenthaler             [email protected]
>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >
>> >
>>
>>
>>
>> --
>> | Rupert Westenthaler             [email protected]
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Reply via email to