Re: Named entity coref resolution based on dbpedia categories and rdf:type

Cristian Petroaca Sat, 15 Mar 2014 12:34:36 -0700

Thanks Rupert.

A couple more questions/issues :


1. Whenever I start the stanbol server I'm seeing this in the console
output :

ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web [153]: Error
starting
 slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
(org.osgi
.framework.BundleException: Unresolved constraint in bundle
org.apache.stanbol.e
nhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
requirement [15
3.0] package; (&(package=javax.ws.rs)(version>=0.0.0)(!(version>=2.0.0))))
org.osgi.framework.BundleException: Unresolved constraint in bundle
org.apache.s
tanbol.enhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
require
ment [153.0] package; (&(package=javax.ws.rs
)(version>=0.0.0)(!(version>=2.0.0))
)
        at org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
        at org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
        at
org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)

        at
org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
)
        at java.lang.Thread.run(Unknown Source)

Despite of this the server starts fine and I can use the enhancer fine. Do
you guys see this as well?


2. Whenever I restart the server the Weighted Chains get messed up. I
usually use the 'default' chain and add my engine to it so there are 11
engines in it. After the restart this chain now contains around 23 engines
in total.




2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <[email protected]
>:

> Hi Cristian,
>
> NER Annotations are typically available as both
> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation [1] in the
> enhancement metadata. As you are already accessing the AnayzedText I
> would prefer using the  NlpAnnotations.NER_ANNOTATION.
>
> best
> Rupert
>
> [1]
> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>
> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
> <[email protected]> wrote:
> > Thanks.
> > I assume I should get the Named entities using the same but with
> > NlpAnnotations.NER_ANNOTATION?
> >
> >
> >
> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
> > [email protected]>:
> >
> >> Hallo Cristian,
> >>
> >> NounPhrases are not added to the RDF enhancement results. You need to
> >> use the AnalyzedText ContentPart [1]
> >>
> >> here is some demo code you can use in the computeEnhancement method
> >>
> >>         AnalysedText at = NlpEngineHelper.getAnalysedText(this, ci,
> true);
> >>         Iterator<? extends Section> sections = at.getSentences();
> >>         if(!sections.hasNext()){ //process as single sentence
> >>             sections = Collections.singleton(at).iterator();
> >>         }
> >>
> >>         while(sections.hasNext()){
> >>             Section section = sections.next();
> >>             Iterator<Span> chunks =
> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
> >>             while(chunks.hasNext()){
> >>                 Span chunk = chunks.next();
> >>                 Value<PhraseTag> phrase =
> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
> >>                 if(phrase.value().getCategory() ==
> LexicalCategory.Noun){
> >>                     log.info(" - NounPhrase [{},{}] {}", new Object[]{
> >>
> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
> >>                 }
> >>             }
> >>         }
> >>
> >> hope this helps
> >>
> >> best
> >> Rupert
> >>
> >> [1]
> >>
> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
> >>
> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
> >> <[email protected]> wrote:
> >> > I started to implement the engine and I'm having problems with getting
> >> > results for noun phrases. I modified the "default" weighted chain to
> also
> >> > include the PosChunkerEngine and ran a sample text : "Angela Merkel
> >> visted
> >> > China. The german chancellor met with various people". I expected that
> >> the
> >> > RDF XML output would contain some info about the noun phrases but I
> >> cannot
> >> > see any.
> >> > Could you point me to the correct way to generate the noun phrases?
> >> >
> >> > Thanks,
> >> > Cristian
> >> >
> >> >
> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
> >> [email protected]>:
> >> >
> >> >> Opened https://issues.apache.org/jira/browse/STANBOL-1279
> >> >>
> >> >>
> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
> >> [email protected]>
> >> >> :
> >> >>
> >> >> Hi Rupert,
> >> >>>
> >> >>> The "spatial" dimension is a good idea. I'll also take a look at
> Yago.
> >> >>>
> >> >>> I will create a Jira with what we talked about here. It will
> probably
> >> >>> have just a draft-like description for now and will be updated as I
> go
> >> >>> along.
> >> >>>
> >> >>> Thanks,
> >> >>> Cristian
> >> >>>
> >> >>>
> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
> >> >>> [email protected]>:
> >> >>>
> >> >>> Hi Cristian,
> >> >>>>
> >> >>>> definitely an interesting approach. You should have a look at Yago2
> >> >>>> [1]. As far as I can remember the Yago taxonomy is much better
> >> >>>> structured as the one used by dbpedia. Mapping suggestions of
> dbpedia
> >> >>>> to concepts in Yago2 is easy as both dbpedia and yago2 do provide
> >> >>>> mappings [2] and [3]
> >> >>>>
> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <[email protected]>:
> >> >>>> >>
> >> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's company made
> a
> >> >>>> >> huge profit".
> >> >>>>
> >> >>>> Thats actually a very good example. Spatial contexts are very
> >> >>>> important as they tend to be often used for referencing. So I would
> >> >>>> suggest to specially treat the spatial context. For spatial
> Entities
> >> >>>> (like a City) this is easy, but even for other (like a Person,
> >> >>>> Company) you could use relations to spatial entities define their
> >> >>>> spatial context. This context could than be used to correctly link
> >> >>>> "The Redmond's company" to "Microsoft".
> >> >>>>
> >> >>>> In addition I would suggest to use the "spatial" context of each
> >> >>>> entity (basically relation to entities that are cities, regions,
> >> >>>> countries) as a separate dimension, because those are very often
> used
> >> >>>> for coreferences.
> >> >>>>
> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
> >> >>>> [2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
> >> >>>> [3]
> >> >>>>
> >>
> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
> >> >>>>
> >> >>>>
> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
> >> >>>> <[email protected]> wrote:
> >> >>>> > There are several dbpedia categories for each entity, in this
> case
> >> for
> >> >>>> > Microsoft we have :
> >> >>>> >
> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
> >> >>>> > category:Microsoft
> >> >>>> > category:Software_companies_of_the_United_States
> >> >>>> > category:Software_companies_based_in_Washington_(state)
> >> >>>> > category:Companies_established_in_1975
> >> >>>> > category:1975_establishments_in_the_United_States
> >> >>>> > category:Companies_based_in_Redmond,_Washington
> >> >>>> >
> category:Multinational_companies_headquartered_in_the_United_States
> >> >>>> > category:Cloud_computing_providers
> >> >>>> > category:Companies_in_the_Dow_Jones_Industrial_Average
> >> >>>> >
> >> >>>> > So we also have "Companies based in Redmont,Washington" which
> could
> >> be
> >> >>>> > matched.
> >> >>>> >
> >> >>>> >
> >> >>>> > There is still other contextual information from dbpedia which
> can
> >> be
> >> >>>> used.
> >> >>>> > For example for an Organization we could also include :
> >> >>>> > dbpprop:industry = Software
> >> >>>> > dbpprop:service = Online Service Providers
> >> >>>> >
> >> >>>> > and for a Person (that's for Barack Obama) :
> >> >>>> >
> >> >>>> > dbpedia-owl:profession:
> >> >>>> >                                dbpedia:Author
> >> >>>> >                                dbpedia:Constitutional_law
> >> >>>> >                                dbpedia:Lawyer
> >> >>>> >                                dbpedia:Community_organizing
> >> >>>> >
> >> >>>> > I'd like to continue investigating this as I think that it may
> have
> >> >>>> some
> >> >>>> > value in increasing the number of coreference resolutions and I'd
> >> like
> >> >>>> to
> >> >>>> > concentrate more on precision rather than recall since we already
> >> have
> >> >>>> a
> >> >>>> > set of coreferences detected by the stanford nlp tool and this
> would
> >> >>>> be as
> >> >>>> > an addition to that (at least this is how I would like to use
> it).
> >> >>>> >
> >> >>>> > Is it ok if I track this by opening a jira? I could update it to
> >> show
> >> >>>> my
> >> >>>> > progress and also my conclusions and if it turns out that it was
> a
> >> bad
> >> >>>> idea
> >> >>>> > then that's the situation at least I'll end up with more
> knowledge
> >> >>>> about
> >> >>>> > Stanbol in the end :).
> >> >>>> >
> >> >>>> >
> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <[email protected]>:
> >> >>>> >
> >> >>>> >> Hi Cristian,
> >> >>>> >>
> >> >>>> >> The approach sounds nice. I don't want to be the devil's
> advocate
> >> but
> >> >>>> I'm
> >> >>>> >> just not sure about the recall using the dbpedia categories
> >> feature.
> >> >>>> For
> >> >>>> >> example, your sentence could be also "Microsoft posted its 2013
> >> >>>> earnings.
> >> >>>> >> The Redmond's company made a huge profit". So, maybe including
> more
> >> >>>> >> contextual information from dbpedia could increase the recall
> but
> >> of
> >> >>>> course
> >> >>>> >> will reduce the precision.
> >> >>>> >>
> >> >>>> >> Cheers,
> >> >>>> >> Rafa
> >> >>>> >>
> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
> >> >>>> >>
> >> >>>> >>  Back with a more detailed description of the steps for making
> this
> >> >>>> kind of
> >> >>>> >>> coreference work.
> >> >>>> >>>
> >> >>>> >>> I will be using references to the following text in the steps
> >> below
> >> >>>> in
> >> >>>> >>> order to make things clearer : "Microsoft posted its 2013
> >> earnings.
> >> >>>> The
> >> >>>> >>> software company made a huge profit."
> >> >>>> >>>
> >> >>>> >>> 1. For every noun phrase in the text which has :
> >> >>>> >>>      a. a determinate pos which implies reference to an entity
> >> local
> >> >>>> to
> >> >>>> >>> the
> >> >>>> >>> text, such as "the, this, these") but not "another, every", etc
> >> which
> >> >>>> >>> implies a reference to an entity outside of the text.
> >> >>>> >>>      b. having at least another noun aside from the main
> required
> >> >>>> noun
> >> >>>> >>> which
> >> >>>> >>> further describes it. For example I will not count "The
> company"
> >> as
> >> >>>> being
> >> >>>> >>> a
> >> >>>> >>> legitimate candidate since this could create a lot of false
> >> >>>> positives by
> >> >>>> >>> considering the double meaning of some words such as "in the
> >> company
> >> >>>> of
> >> >>>> >>> good people".
> >> >>>> >>> "The software company" is a good candidate since we also have
> >> >>>> "software".
> >> >>>> >>>
> >> >>>> >>> 2. match the nouns in the noun phrase to the contents of the
> >> dbpedia
> >> >>>> >>> categories of each named entity found prior to the location of
> the
> >> >>>> noun
> >> >>>> >>> phrase in the text.
> >> >>>> >>> The dbpedia categories are in the following format (for
> Microsoft
> >> for
> >> >>>> >>> example) : "Software companies of the United States".
> >> >>>> >>>   So we try to match "software company" with that.
> >> >>>> >>> First, as you can see, the main noun in the dbpedia category
> has a
> >> >>>> plural
> >> >>>> >>> form and it's the same for all categories which I saw. I don't
> >> know
> >> >>>> if
> >> >>>> >>> there's an easier way to do this but I thought of applying a
> >> >>>> lemmatizer on
> >> >>>> >>> the category and the noun phrase in order for them to have a
> >> common
> >> >>>> >>> denominator.This also works if the noun phrase itself has a
> plural
> >> >>>> form.
> >> >>>> >>>
> >> >>>> >>> Second, I'll need to use for comparison only the words in the
> >> >>>> category
> >> >>>> >>> which are themselves nouns and not prepositions or determiners
> >> such
> >> >>>> as "of
> >> >>>> >>> the".This means that I need to pos tag the categories contents
> as
> >> >>>> well.
> >> >>>> >>> I was thinking of running the pos and lemma on the dbpedia
> >> >>>> categories when
> >> >>>> >>> building the dbpedia backed entity hub and storing them for
> later
> >> >>>> use - I
> >> >>>> >>> don't know how feasible this is at the moment.
> >> >>>> >>>
> >> >>>> >>> After this I can compare each noun in the noun phrase with the
> >> >>>> equivalent
> >> >>>> >>> nouns in the categories and based on the number of matches I
> can
> >> >>>> create a
> >> >>>> >>> confidence level.
> >> >>>> >>>
> >> >>>> >>> 3. match the noun of the noun phrase with the rdf:type from
> >> dbpedia
> >> >>>> of the
> >> >>>> >>> named entity. If this matches increase the confidence level.
> >> >>>> >>>
> >> >>>> >>> 4. If there are multiple named entities which can match a
> certain
> >> >>>> noun
> >> >>>> >>> phrase then link the noun phrase with the closest named entity
> >> prior
> >> >>>> to it
> >> >>>> >>> in the text.
> >> >>>> >>>
> >> >>>> >>> What do you think?
> >> >>>> >>>
> >> >>>> >>> Cristian
> >> >>>> >>>
> >> >>>> >>> 2014-01-31 Cristian Petroaca <[email protected]>:
> >> >>>> >>>
> >> >>>> >>>  Hi Rafa,
> >> >>>> >>>>
> >> >>>> >>>> I don't yet have a concrete heursitic but I'm working on it.
> I'll
> >> >>>> provide
> >> >>>> >>>> it here so that you guys can give me a feedback on it.
> >> >>>> >>>>
> >> >>>> >>>> What are "locality" features?
> >> >>>> >>>>
> >> >>>> >>>> I looked at Bart and other coref tools such as ArkRef and
> >> >>>> CherryPicker
> >> >>>> >>>> and
> >> >>>> >>>> they don't provide such a coreference.
> >> >>>> >>>>
> >> >>>> >>>> Cristian
> >> >>>> >>>>
> >> >>>> >>>>
> >> >>>> >>>> 2014-01-30 Rafa Haro <[email protected]>:
> >> >>>> >>>>
> >> >>>> >>>> Hi Cristian,
> >> >>>> >>>>
> >> >>>> >>>>> Without having more details about your concrete heuristic,
> in my
> >> >>>> honest
> >> >>>> >>>>> opinion, such approach could produce a lot of false
> positives. I
> >> >>>> don't
> >> >>>> >>>>> know
> >> >>>> >>>>> if you are planning to use some "locality" features to detect
> >> such
> >> >>>> >>>>> coreferences but you need to take into account that it is
> quite
> >> >>>> usual
> >> >>>> >>>>> that
> >> >>>> >>>>> coreferenced mentions can occurs even in different
> paragraphs.
> >> >>>> Although
> >> >>>> >>>>> I'm
> >> >>>> >>>>> not an expert in Natural Language Understanding, I would say
> it
> >> is
> >> >>>> quite
> >> >>>> >>>>> difficult to get decent precision/recall rates for
> coreferencing
> >> >>>> using
> >> >>>> >>>>> fixed rules. Maybe you can give a try to others tools like
> BART
> >> (
> >> >>>> >>>>> http://www.bart-coref.org/).
> >> >>>> >>>>>
> >> >>>> >>>>> Cheers,
> >> >>>> >>>>> Rafa Haro
> >> >>>> >>>>>
> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
> >> >>>> >>>>>
> >> >>>> >>>>>   Hi,
> >> >>>> >>>>>
> >> >>>> >>>>>> One of the necessary steps for implementing the Event
> >> extraction
> >> >>>> Engine
> >> >>>> >>>>>> feature :
> https://issues.apache.org/jira/browse/STANBOL-1121is
> >> >>>> to
> >> >>>> >>>>>> have
> >> >>>> >>>>>> coreference resolution in the given text. This is provided
> now
> >> >>>> via the
> >> >>>> >>>>>> stanford-nlp project but as far as I saw this module is
> >> performing
> >> >>>> >>>>>> mostly
> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and Mr. Obama)
> >> >>>> coreference
> >> >>>> >>>>>> resolution.
> >> >>>> >>>>>>
> >> >>>> >>>>>> In order to get more coreferences from the text I though of
> >> >>>> creating
> >> >>>> >>>>>> some
> >> >>>> >>>>>> logic that would detect this kind of coreference :
> >> >>>> >>>>>> "Apple reaches new profit heights. The software company just
> >> >>>> announced
> >> >>>> >>>>>> its
> >> >>>> >>>>>> 2013 earnings."
> >> >>>> >>>>>> Here "The software company" obviously refers to "Apple".
> >> >>>> >>>>>> So I'd like to detect coreferences of Named Entities which
> are
> >> of
> >> >>>> the
> >> >>>> >>>>>> rdf:type of the Named Entity , in this case "company" and
> also
> >> >>>> have
> >> >>>> >>>>>> attributes which can be found in the dbpedia categories of
> the
> >> >>>> named
> >> >>>> >>>>>> entity, in this case "software".
> >> >>>> >>>>>>
> >> >>>> >>>>>> The detection of coreferences such as "The software
> company" in
> >> >>>> the
> >> >>>> >>>>>> text
> >> >>>> >>>>>> would also be done by either using the new Pos Tag Based
> Phrase
> >> >>>> >>>>>> extraction
> >> >>>> >>>>>> Engine (noun phrases) or by using a dependency tree of the
> >> >>>> sentence and
> >> >>>> >>>>>> picking up only subjects or objects.
> >> >>>> >>>>>>
> >> >>>> >>>>>> At this point I'd like to know if this kind of logic would
> be
> >> >>>> useful
> >> >>>> >>>>>> as a
> >> >>>> >>>>>> separate Enhancement Engine (in case the precision and
> recall
> >> are
> >> >>>> good
> >> >>>> >>>>>> enough) in Stanbol?
> >> >>>> >>>>>>
> >> >>>> >>>>>> Thanks,
> >> >>>> >>>>>> Cristian
> >> >>>> >>>>>>
> >> >>>> >>>>>>
> >> >>>> >>>>>>
> >> >>>> >>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> | Rupert Westenthaler             [email protected]
> >> >>>> | Bodenlehenstraße 11
> ++43-699-11108907
> >> >>>> | A-5500 Bischofshofen
> >> >>>>
> >> >>>
> >> >>>
> >> >>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             [email protected]
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Reply via email to