1. Updated to the latest code and it's gone. Cool 2. I start the stable launcher -> create a new instance of the PosChunkerEngine -> add it to the default chain. At this point everything looks good and works ok. After I restart the server the default chain is gone and instead I see this in the enhancement chains page : all-active (default, id: 149, ranking: 0, impl: AllActiveEnginesChain ). all-active did not contain the 'default' word before the restart.
It looks like the config files are exactly what I need. Thanks. 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <rupert.westentha...@gmail.com >: > On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca > <cristian.petro...@gmail.com> wrote: > > Thanks Rupert. > > > > A couple more questions/issues : > > > > 1. Whenever I start the stanbol server I'm seeing this in the console > > output : > > > > This should be fixed with STANBOL-1278 [1] [2] > > > 2. Whenever I restart the server the Weighted Chains get messed up. I > > usually use the 'default' chain and add my engine to it so there are 11 > > engines in it. After the restart this chain now contains around 23 > engines > > in total. > > I was not able to replicate this. What I tried was > > (1) start up the stable launcher > (2) add an additional engine to the default chain > (3) restart the launcher > > The default chain was not changed after (2) and (3). So I would need > further information for knowing why this is happening. > > Generally it is better to create you own chain instance as modifying > one that is provided by the default configuration. I would also > recommend that you keep your test configuration in text files and to > copy those to the 'stanbol/fileinstall' folder. Doing so prevent you > from manually entering the configuration after a software update. The > production-mode section [3] provides information on how to do that. > > best > Rupert > > [1] https://issues.apache.org/jira/browse/STANBOL-1278 > [2] http://svn.apache.org/r1576623 > [3] http://stanbol.apache.org/docs/trunk/production-mode > > > ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web [153]: Error > > starting > > > > slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star > > tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar > > (org.osgi > > .framework.BundleException: Unresolved constraint in bundle > > org.apache.stanbol.e > > nhancer.engine.topic.web [153]: Unable to resolve 153.0: missing > > requirement [15 > > 3.0] package; (&(package=javax.ws.rs > )(version>=0.0.0)(!(version>=2.0.0)))) > > org.osgi.framework.BundleException: Unresolved constraint in bundle > > org.apache.s > > tanbol.enhancer.engine.topic.web [153]: Unable to resolve 153.0: missing > > require > > ment [153.0] package; (&(package=javax.ws.rs > > )(version>=0.0.0)(!(version>=2.0.0)) > > ) > > at > org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443) > > at org.apache.felix.framework.Felix.startBundle(Felix.java:1727) > > at > > org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156) > > > > at > > org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264 > > ) > > at java.lang.Thread.run(Unknown Source) > > > > Despite of this the server starts fine and I can use the enhancer fine. > Do > > you guys see this as well? > > > > > > 2. Whenever I restart the server the Weighted Chains get messed up. I > > usually use the 'default' chain and add my engine to it so there are 11 > > engines in it. After the restart this chain now contains around 23 > engines > > in total. > > > > > > > > > > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler < > rupert.westentha...@gmail.com > >>: > > > >> Hi Cristian, > >> > >> NER Annotations are typically available as both > >> NlpAnnotations.NER_ANNOTATION and fise:TextAnnotation [1] in the > >> enhancement metadata. As you are already accessing the AnayzedText I > >> would prefer using the NlpAnnotations.NER_ANNOTATION. > >> > >> best > >> Rupert > >> > >> [1] > >> > http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation > >> > >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca > >> <cristian.petro...@gmail.com> wrote: > >> > Thanks. > >> > I assume I should get the Named entities using the same but with > >> > NlpAnnotations.NER_ANNOTATION? > >> > > >> > > >> > > >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler < > >> > rupert.westentha...@gmail.com>: > >> > > >> >> Hallo Cristian, > >> >> > >> >> NounPhrases are not added to the RDF enhancement results. You need to > >> >> use the AnalyzedText ContentPart [1] > >> >> > >> >> here is some demo code you can use in the computeEnhancement method > >> >> > >> >> AnalysedText at = NlpEngineHelper.getAnalysedText(this, ci, > >> true); > >> >> Iterator<? extends Section> sections = at.getSentences(); > >> >> if(!sections.hasNext()){ //process as single sentence > >> >> sections = Collections.singleton(at).iterator(); > >> >> } > >> >> > >> >> while(sections.hasNext()){ > >> >> Section section = sections.next(); > >> >> Iterator<Span> chunks = > >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk)); > >> >> while(chunks.hasNext()){ > >> >> Span chunk = chunks.next(); > >> >> Value<PhraseTag> phrase = > >> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION); > >> >> if(phrase.value().getCategory() == > >> LexicalCategory.Noun){ > >> >> log.info(" - NounPhrase [{},{}] {}", new > Object[]{ > >> >> > >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()}); > >> >> } > >> >> } > >> >> } > >> >> > >> >> hope this helps > >> >> > >> >> best > >> >> Rupert > >> >> > >> >> [1] > >> >> > >> > http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext > >> >> > >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca > >> >> <cristian.petro...@gmail.com> wrote: > >> >> > I started to implement the engine and I'm having problems with > getting > >> >> > results for noun phrases. I modified the "default" weighted chain > to > >> also > >> >> > include the PosChunkerEngine and ran a sample text : "Angela Merkel > >> >> visted > >> >> > China. The german chancellor met with various people". I expected > that > >> >> the > >> >> > RDF XML output would contain some info about the noun phrases but I > >> >> cannot > >> >> > see any. > >> >> > Could you point me to the correct way to generate the noun phrases? > >> >> > > >> >> > Thanks, > >> >> > Cristian > >> >> > > >> >> > > >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca < > >> >> cristian.petro...@gmail.com>: > >> >> > > >> >> >> Opened https://issues.apache.org/jira/browse/STANBOL-1279 > >> >> >> > >> >> >> > >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca < > >> >> cristian.petro...@gmail.com> > >> >> >> : > >> >> >> > >> >> >> Hi Rupert, > >> >> >>> > >> >> >>> The "spatial" dimension is a good idea. I'll also take a look at > >> Yago. > >> >> >>> > >> >> >>> I will create a Jira with what we talked about here. It will > >> probably > >> >> >>> have just a draft-like description for now and will be updated > as I > >> go > >> >> >>> along. > >> >> >>> > >> >> >>> Thanks, > >> >> >>> Cristian > >> >> >>> > >> >> >>> > >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler < > >> >> >>> rupert.westentha...@gmail.com>: > >> >> >>> > >> >> >>> Hi Cristian, > >> >> >>>> > >> >> >>>> definitely an interesting approach. You should have a look at > Yago2 > >> >> >>>> [1]. As far as I can remember the Yago taxonomy is much better > >> >> >>>> structured as the one used by dbpedia. Mapping suggestions of > >> dbpedia > >> >> >>>> to concepts in Yago2 is easy as both dbpedia and yago2 do > provide > >> >> >>>> mappings [2] and [3] > >> >> >>>> > >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>: > >> >> >>>> >> > >> >> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's company > made > >> a > >> >> >>>> >> huge profit". > >> >> >>>> > >> >> >>>> Thats actually a very good example. Spatial contexts are very > >> >> >>>> important as they tend to be often used for referencing. So I > would > >> >> >>>> suggest to specially treat the spatial context. For spatial > >> Entities > >> >> >>>> (like a City) this is easy, but even for other (like a Person, > >> >> >>>> Company) you could use relations to spatial entities define > their > >> >> >>>> spatial context. This context could than be used to correctly > link > >> >> >>>> "The Redmond's company" to "Microsoft". > >> >> >>>> > >> >> >>>> In addition I would suggest to use the "spatial" context of each > >> >> >>>> entity (basically relation to entities that are cities, regions, > >> >> >>>> countries) as a separate dimension, because those are very often > >> used > >> >> >>>> for coreferences. > >> >> >>>> > >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/ > >> >> >>>> [2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2 > >> >> >>>> [3] > >> >> >>>> > >> >> > >> > http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z > >> >> >>>> > >> >> >>>> > >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca > >> >> >>>> <cristian.petro...@gmail.com> wrote: > >> >> >>>> > There are several dbpedia categories for each entity, in this > >> case > >> >> for > >> >> >>>> > Microsoft we have : > >> >> >>>> > > >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index > >> >> >>>> > category:Microsoft > >> >> >>>> > category:Software_companies_of_the_United_States > >> >> >>>> > category:Software_companies_based_in_Washington_(state) > >> >> >>>> > category:Companies_established_in_1975 > >> >> >>>> > category:1975_establishments_in_the_United_States > >> >> >>>> > category:Companies_based_in_Redmond,_Washington > >> >> >>>> > > >> category:Multinational_companies_headquartered_in_the_United_States > >> >> >>>> > category:Cloud_computing_providers > >> >> >>>> > category:Companies_in_the_Dow_Jones_Industrial_Average > >> >> >>>> > > >> >> >>>> > So we also have "Companies based in Redmont,Washington" which > >> could > >> >> be > >> >> >>>> > matched. > >> >> >>>> > > >> >> >>>> > > >> >> >>>> > There is still other contextual information from dbpedia which > >> can > >> >> be > >> >> >>>> used. > >> >> >>>> > For example for an Organization we could also include : > >> >> >>>> > dbpprop:industry = Software > >> >> >>>> > dbpprop:service = Online Service Providers > >> >> >>>> > > >> >> >>>> > and for a Person (that's for Barack Obama) : > >> >> >>>> > > >> >> >>>> > dbpedia-owl:profession: > >> >> >>>> > dbpedia:Author > >> >> >>>> > dbpedia:Constitutional_law > >> >> >>>> > dbpedia:Lawyer > >> >> >>>> > dbpedia:Community_organizing > >> >> >>>> > > >> >> >>>> > I'd like to continue investigating this as I think that it may > >> have > >> >> >>>> some > >> >> >>>> > value in increasing the number of coreference resolutions and > I'd > >> >> like > >> >> >>>> to > >> >> >>>> > concentrate more on precision rather than recall since we > already > >> >> have > >> >> >>>> a > >> >> >>>> > set of coreferences detected by the stanford nlp tool and this > >> would > >> >> >>>> be as > >> >> >>>> > an addition to that (at least this is how I would like to use > >> it). > >> >> >>>> > > >> >> >>>> > Is it ok if I track this by opening a jira? I could update it > to > >> >> show > >> >> >>>> my > >> >> >>>> > progress and also my conclusions and if it turns out that it > was > >> a > >> >> bad > >> >> >>>> idea > >> >> >>>> > then that's the situation at least I'll end up with more > >> knowledge > >> >> >>>> about > >> >> >>>> > Stanbol in the end :). > >> >> >>>> > > >> >> >>>> > > >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>: > >> >> >>>> > > >> >> >>>> >> Hi Cristian, > >> >> >>>> >> > >> >> >>>> >> The approach sounds nice. I don't want to be the devil's > >> advocate > >> >> but > >> >> >>>> I'm > >> >> >>>> >> just not sure about the recall using the dbpedia categories > >> >> feature. > >> >> >>>> For > >> >> >>>> >> example, your sentence could be also "Microsoft posted its > 2013 > >> >> >>>> earnings. > >> >> >>>> >> The Redmond's company made a huge profit". So, maybe > including > >> more > >> >> >>>> >> contextual information from dbpedia could increase the recall > >> but > >> >> of > >> >> >>>> course > >> >> >>>> >> will reduce the precision. > >> >> >>>> >> > >> >> >>>> >> Cheers, > >> >> >>>> >> Rafa > >> >> >>>> >> > >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió: > >> >> >>>> >> > >> >> >>>> >> Back with a more detailed description of the steps for > making > >> this > >> >> >>>> kind of > >> >> >>>> >>> coreference work. > >> >> >>>> >>> > >> >> >>>> >>> I will be using references to the following text in the > steps > >> >> below > >> >> >>>> in > >> >> >>>> >>> order to make things clearer : "Microsoft posted its 2013 > >> >> earnings. > >> >> >>>> The > >> >> >>>> >>> software company made a huge profit." > >> >> >>>> >>> > >> >> >>>> >>> 1. For every noun phrase in the text which has : > >> >> >>>> >>> a. a determinate pos which implies reference to an > entity > >> >> local > >> >> >>>> to > >> >> >>>> >>> the > >> >> >>>> >>> text, such as "the, this, these") but not "another, every", > etc > >> >> which > >> >> >>>> >>> implies a reference to an entity outside of the text. > >> >> >>>> >>> b. having at least another noun aside from the main > >> required > >> >> >>>> noun > >> >> >>>> >>> which > >> >> >>>> >>> further describes it. For example I will not count "The > >> company" > >> >> as > >> >> >>>> being > >> >> >>>> >>> a > >> >> >>>> >>> legitimate candidate since this could create a lot of false > >> >> >>>> positives by > >> >> >>>> >>> considering the double meaning of some words such as "in the > >> >> company > >> >> >>>> of > >> >> >>>> >>> good people". > >> >> >>>> >>> "The software company" is a good candidate since we also > have > >> >> >>>> "software". > >> >> >>>> >>> > >> >> >>>> >>> 2. match the nouns in the noun phrase to the contents of the > >> >> dbpedia > >> >> >>>> >>> categories of each named entity found prior to the location > of > >> the > >> >> >>>> noun > >> >> >>>> >>> phrase in the text. > >> >> >>>> >>> The dbpedia categories are in the following format (for > >> Microsoft > >> >> for > >> >> >>>> >>> example) : "Software companies of the United States". > >> >> >>>> >>> So we try to match "software company" with that. > >> >> >>>> >>> First, as you can see, the main noun in the dbpedia category > >> has a > >> >> >>>> plural > >> >> >>>> >>> form and it's the same for all categories which I saw. I > don't > >> >> know > >> >> >>>> if > >> >> >>>> >>> there's an easier way to do this but I thought of applying a > >> >> >>>> lemmatizer on > >> >> >>>> >>> the category and the noun phrase in order for them to have a > >> >> common > >> >> >>>> >>> denominator.This also works if the noun phrase itself has a > >> plural > >> >> >>>> form. > >> >> >>>> >>> > >> >> >>>> >>> Second, I'll need to use for comparison only the words in > the > >> >> >>>> category > >> >> >>>> >>> which are themselves nouns and not prepositions or > determiners > >> >> such > >> >> >>>> as "of > >> >> >>>> >>> the".This means that I need to pos tag the categories > contents > >> as > >> >> >>>> well. > >> >> >>>> >>> I was thinking of running the pos and lemma on the dbpedia > >> >> >>>> categories when > >> >> >>>> >>> building the dbpedia backed entity hub and storing them for > >> later > >> >> >>>> use - I > >> >> >>>> >>> don't know how feasible this is at the moment. > >> >> >>>> >>> > >> >> >>>> >>> After this I can compare each noun in the noun phrase with > the > >> >> >>>> equivalent > >> >> >>>> >>> nouns in the categories and based on the number of matches I > >> can > >> >> >>>> create a > >> >> >>>> >>> confidence level. > >> >> >>>> >>> > >> >> >>>> >>> 3. match the noun of the noun phrase with the rdf:type from > >> >> dbpedia > >> >> >>>> of the > >> >> >>>> >>> named entity. If this matches increase the confidence level. > >> >> >>>> >>> > >> >> >>>> >>> 4. If there are multiple named entities which can match a > >> certain > >> >> >>>> noun > >> >> >>>> >>> phrase then link the noun phrase with the closest named > entity > >> >> prior > >> >> >>>> to it > >> >> >>>> >>> in the text. > >> >> >>>> >>> > >> >> >>>> >>> What do you think? > >> >> >>>> >>> > >> >> >>>> >>> Cristian > >> >> >>>> >>> > >> >> >>>> >>> 2014-01-31 Cristian Petroaca <cristian.petro...@gmail.com>: > >> >> >>>> >>> > >> >> >>>> >>> Hi Rafa, > >> >> >>>> >>>> > >> >> >>>> >>>> I don't yet have a concrete heursitic but I'm working on > it. > >> I'll > >> >> >>>> provide > >> >> >>>> >>>> it here so that you guys can give me a feedback on it. > >> >> >>>> >>>> > >> >> >>>> >>>> What are "locality" features? > >> >> >>>> >>>> > >> >> >>>> >>>> I looked at Bart and other coref tools such as ArkRef and > >> >> >>>> CherryPicker > >> >> >>>> >>>> and > >> >> >>>> >>>> they don't provide such a coreference. > >> >> >>>> >>>> > >> >> >>>> >>>> Cristian > >> >> >>>> >>>> > >> >> >>>> >>>> > >> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>: > >> >> >>>> >>>> > >> >> >>>> >>>> Hi Cristian, > >> >> >>>> >>>> > >> >> >>>> >>>>> Without having more details about your concrete heuristic, > >> in my > >> >> >>>> honest > >> >> >>>> >>>>> opinion, such approach could produce a lot of false > >> positives. I > >> >> >>>> don't > >> >> >>>> >>>>> know > >> >> >>>> >>>>> if you are planning to use some "locality" features to > detect > >> >> such > >> >> >>>> >>>>> coreferences but you need to take into account that it is > >> quite > >> >> >>>> usual > >> >> >>>> >>>>> that > >> >> >>>> >>>>> coreferenced mentions can occurs even in different > >> paragraphs. > >> >> >>>> Although > >> >> >>>> >>>>> I'm > >> >> >>>> >>>>> not an expert in Natural Language Understanding, I would > say > >> it > >> >> is > >> >> >>>> quite > >> >> >>>> >>>>> difficult to get decent precision/recall rates for > >> coreferencing > >> >> >>>> using > >> >> >>>> >>>>> fixed rules. Maybe you can give a try to others tools like > >> BART > >> >> ( > >> >> >>>> >>>>> http://www.bart-coref.org/). > >> >> >>>> >>>>> > >> >> >>>> >>>>> Cheers, > >> >> >>>> >>>>> Rafa Haro > >> >> >>>> >>>>> > >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió: > >> >> >>>> >>>>> > >> >> >>>> >>>>> Hi, > >> >> >>>> >>>>> > >> >> >>>> >>>>>> One of the necessary steps for implementing the Event > >> >> extraction > >> >> >>>> Engine > >> >> >>>> >>>>>> feature : > >> https://issues.apache.org/jira/browse/STANBOL-1121is > >> >> >>>> to > >> >> >>>> >>>>>> have > >> >> >>>> >>>>>> coreference resolution in the given text. This is > provided > >> now > >> >> >>>> via the > >> >> >>>> >>>>>> stanford-nlp project but as far as I saw this module is > >> >> performing > >> >> >>>> >>>>>> mostly > >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and Mr. > Obama) > >> >> >>>> coreference > >> >> >>>> >>>>>> resolution. > >> >> >>>> >>>>>> > >> >> >>>> >>>>>> In order to get more coreferences from the text I though > of > >> >> >>>> creating > >> >> >>>> >>>>>> some > >> >> >>>> >>>>>> logic that would detect this kind of coreference : > >> >> >>>> >>>>>> "Apple reaches new profit heights. The software company > just > >> >> >>>> announced > >> >> >>>> >>>>>> its > >> >> >>>> >>>>>> 2013 earnings." > >> >> >>>> >>>>>> Here "The software company" obviously refers to "Apple". > >> >> >>>> >>>>>> So I'd like to detect coreferences of Named Entities > which > >> are > >> >> of > >> >> >>>> the > >> >> >>>> >>>>>> rdf:type of the Named Entity , in this case "company" and > >> also > >> >> >>>> have > >> >> >>>> >>>>>> attributes which can be found in the dbpedia categories > of > >> the > >> >> >>>> named > >> >> >>>> >>>>>> entity, in this case "software". > >> >> >>>> >>>>>> > >> >> >>>> >>>>>> The detection of coreferences such as "The software > >> company" in > >> >> >>>> the > >> >> >>>> >>>>>> text > >> >> >>>> >>>>>> would also be done by either using the new Pos Tag Based > >> Phrase > >> >> >>>> >>>>>> extraction > >> >> >>>> >>>>>> Engine (noun phrases) or by using a dependency tree of > the > >> >> >>>> sentence and > >> >> >>>> >>>>>> picking up only subjects or objects. > >> >> >>>> >>>>>> > >> >> >>>> >>>>>> At this point I'd like to know if this kind of logic > would > >> be > >> >> >>>> useful > >> >> >>>> >>>>>> as a > >> >> >>>> >>>>>> separate Enhancement Engine (in case the precision and > >> recall > >> >> are > >> >> >>>> good > >> >> >>>> >>>>>> enough) in Stanbol? > >> >> >>>> >>>>>> > >> >> >>>> >>>>>> Thanks, > >> >> >>>> >>>>>> Cristian > >> >> >>>> >>>>>> > >> >> >>>> >>>>>> > >> >> >>>> >>>>>> > >> >> >>>> >> > >> >> >>>> > >> >> >>>> > >> >> >>>> > >> >> >>>> -- > >> >> >>>> | Rupert Westenthaler rupert.westentha...@gmail.com > >> >> >>>> | Bodenlehenstraße 11 > >> ++43-699-11108907 > >> >> >>>> | A-5500 Bischofshofen > >> >> >>>> > >> >> >>> > >> >> >>> > >> >> >> > >> >> > >> >> > >> >> > >> >> -- > >> >> | Rupert Westenthaler rupert.westentha...@gmail.com > >> >> | Bodenlehenstraße 11 ++43-699-11108907 > >> >> | A-5500 Bischofshofen > >> >> > >> > >> > >> > >> -- > >> | Rupert Westenthaler rupert.westentha...@gmail.com > >> | Bodenlehenstraße 11 ++43-699-11108907 > >> | A-5500 Bischofshofen > >> > > > > -- > | Rupert Westenthaler rupert.westentha...@gmail.com > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen >