I did a clean on the whole project and now I wanted to do another "mvn clean install" but I am getting this :
"[INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6: run (download) on project org.apache.stanbol.data.opennlp.lang.es: An Ant BuildE xception has occured: The following error occurred while executing this line: [ERROR] C:\Data\Projects\Stanbol\main\data\opennlp\lang\es\download_models.xml:3 3: Failed to copy https://github.com/utcompling/OpenNLP-Models/raw/58ef0c6003140 3e66e47ae35edaf58d3478b67af/models/es/opennlp-es-maxent-pos-es.bin to C:\Data\Pr ojects\Stanbol\main\data\opennlp\lang\es\downloads\resources\org\apache\stanbol\ data\opennlp\es-pos-maxent.bin due to javax.net.ssl.SSLProtocolException handshake alert : unrecognized_name" 2014-03-20 11:25 GMT+02:00 Rupert Westenthaler < rupert.westentha...@gmail.com>: > Hi Cristian, > > On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca > <cristian.petro...@gmail.com> wrote: > > > stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"] > > service.ranking=I"-2147483648" > > stanbol.enhancer.chain.name="default" > > Does look fine to me. Do you see any exception during the startup of > the launcher. Can you check the status of this component in the > component tab of the felix web console [1] (search for > "org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain"). If > you have multiple you can find the correct one by comparing the > "Properties" with those in the configuration file. > > I guess that the according service is in the 'unsatisfied' as you do > not see it in the web interface. But if this is the case you should > also see the according exception in the log. You can also manually > stop/start the component. In this case the exception should be > re-thrown and you do not need to search the log for it. > > best > Rupert > > > [1] http://localhost:8080/system/console/components > > > > > > > > > 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler < > rupert.westentha...@gmail.com > >>: > > > >> Hi Cristian, > >> > >> you can not send attachments to the list. Please copy the contents > >> directly to the mail > >> > >> thx > >> Rupert > >> > >> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca > >> <cristian.petro...@gmail.com> wrote: > >> > The config attached. > >> > > >> > > >> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler > >> > <rupert.westentha...@gmail.com>: > >> > > >> >> Hi Cristian, > >> >> > >> >> can you provide the contents of the chain after your modifications? > >> >> Would be interesting to test why the chain is no longer active after > >> >> the restart. > >> >> > >> >> You can find the config file in the 'stanbol/fileinstall' folder. > >> >> > >> >> best > >> >> Rupert > >> >> > >> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca > >> >> <cristian.petro...@gmail.com> wrote: > >> >> > Related to the default chain selection rules : before restart I > had a > >> >> > chain > >> >> > with the name 'default' as in I could access it via > >> >> > enhancer/chain/default. > >> >> > Then I just added another engine to the 'default' chain. I assumed > >> that > >> >> > after the restart the chain with the 'default' name would be > >> persisted. > >> >> > So > >> >> > the first rule should have been applied after the restart as well. > But > >> >> > instead I cannot reach it via enhancer/chain/default anymore so its > >> >> > gone. > >> >> > Anyway, this is not a big deal, it's not blocking me in any way, I > >> just > >> >> > wanted to understand where the problem is. > >> >> > > >> >> > > >> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler > >> >> > <rupert.westentha...@gmail.com > >> >> >>: > >> >> > > >> >> >> Hi Cristian > >> >> >> > >> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca > >> >> >> <cristian.petro...@gmail.com> wrote: > >> >> >> > 1. Updated to the latest code and it's gone. Cool > >> >> >> > > >> >> >> > 2. I start the stable launcher -> create a new instance of the > >> >> >> > PosChunkerEngine -> add it to the default chain. At this point > >> >> >> > everything > >> >> >> > looks good and works ok. > >> >> >> > After I restart the server the default chain is gone and > instead I > >> >> >> > see > >> >> >> this > >> >> >> > in the enhancement chains page : all-active (default, id: 149, > >> >> >> > ranking: > >> >> >> 0, > >> >> >> > impl: AllActiveEnginesChain ). all-active did not contain the > >> >> >> > 'default' > >> >> >> > word before the restart. > >> >> >> > > >> >> >> > >> >> >> Please note the default chain selection rules as described at [1]. > >> You > >> >> >> can also access chains chains under '/enhancer/chain/{chain-name}' > >> >> >> > >> >> >> best > >> >> >> Rupert > >> >> >> > >> >> >> [1] > >> >> >> > >> >> >> > >> > http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain > >> >> >> > >> >> >> > It looks like the config files are exactly what I need. Thanks. > >> >> >> > > >> >> >> > > >> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler < > >> >> >> rupert.westentha...@gmail.com > >> >> >> >>: > >> >> >> > > >> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca > >> >> >> >> <cristian.petro...@gmail.com> wrote: > >> >> >> >> > Thanks Rupert. > >> >> >> >> > > >> >> >> >> > A couple more questions/issues : > >> >> >> >> > > >> >> >> >> > 1. Whenever I start the stanbol server I'm seeing this in the > >> >> >> >> > console > >> >> >> >> > output : > >> >> >> >> > > >> >> >> >> > >> >> >> >> This should be fixed with STANBOL-1278 [1] [2] > >> >> >> >> > >> >> >> >> > 2. Whenever I restart the server the Weighted Chains get > messed > >> >> >> >> > up. I > >> >> >> >> > usually use the 'default' chain and add my engine to it so > there > >> >> >> >> > are > >> >> >> 11 > >> >> >> >> > engines in it. After the restart this chain now contains > around > >> 23 > >> >> >> >> engines > >> >> >> >> > in total. > >> >> >> >> > >> >> >> >> I was not able to replicate this. What I tried was > >> >> >> >> > >> >> >> >> (1) start up the stable launcher > >> >> >> >> (2) add an additional engine to the default chain > >> >> >> >> (3) restart the launcher > >> >> >> >> > >> >> >> >> The default chain was not changed after (2) and (3). So I would > >> need > >> >> >> >> further information for knowing why this is happening. > >> >> >> >> > >> >> >> >> Generally it is better to create you own chain instance as > >> modifying > >> >> >> >> one that is provided by the default configuration. I would also > >> >> >> >> recommend that you keep your test configuration in text files > and > >> to > >> >> >> >> copy those to the 'stanbol/fileinstall' folder. Doing so > prevent > >> you > >> >> >> >> from manually entering the configuration after a software > update. > >> >> >> >> The > >> >> >> >> production-mode section [3] provides information on how to do > >> that. > >> >> >> >> > >> >> >> >> best > >> >> >> >> Rupert > >> >> >> >> > >> >> >> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278 > >> >> >> >> [2] http://svn.apache.org/r1576623 > >> >> >> >> [3] http://stanbol.apache.org/docs/trunk/production-mode > >> >> >> >> > >> >> >> >> > ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web > >> [153]: > >> >> >> Error > >> >> >> >> > starting > >> >> >> >> > > >> >> >> >> > >> >> >> > >> >> >> > >> > slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star > >> >> >> >> > > >> >> >> >> > > >> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar > >> >> >> >> > (org.osgi > >> >> >> >> > .framework.BundleException: Unresolved constraint in bundle > >> >> >> >> > org.apache.stanbol.e > >> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve 153.0: > missing > >> >> >> >> > requirement [15 > >> >> >> >> > 3.0] package; (&(package=javax.ws.rs > >> >> >> >> )(version>=0.0.0)(!(version>=2.0.0)))) > >> >> >> >> > org.osgi.framework.BundleException: Unresolved constraint in > >> >> >> >> > bundle > >> >> >> >> > org.apache.s > >> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve > 153.0: > >> >> >> missing > >> >> >> >> > require > >> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs > >> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0)) > >> >> >> >> > ) > >> >> >> >> > at > >> >> >> >> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443) > >> >> >> >> > at > >> >> >> org.apache.felix.framework.Felix.startBundle(Felix.java:1727) > >> >> >> >> > at > >> >> >> >> > > >> >> >> >> > > >> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156) > >> >> >> >> > > >> >> >> >> > at > >> >> >> >> > > >> >> >> >> > > >> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264 > >> >> >> >> > ) > >> >> >> >> > at java.lang.Thread.run(Unknown Source) > >> >> >> >> > > >> >> >> >> > Despite of this the server starts fine and I can use the > >> enhancer > >> >> >> fine. > >> >> >> >> Do > >> >> >> >> > you guys see this as well? > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > 2. Whenever I restart the server the Weighted Chains get > messed > >> >> >> >> > up. I > >> >> >> >> > usually use the 'default' chain and add my engine to it so > there > >> >> >> >> > are > >> >> >> 11 > >> >> >> >> > engines in it. After the restart this chain now contains > around > >> 23 > >> >> >> >> engines > >> >> >> >> > in total. > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler < > >> >> >> >> rupert.westentha...@gmail.com > >> >> >> >> >>: > >> >> >> >> > > >> >> >> >> >> Hi Cristian, > >> >> >> >> >> > >> >> >> >> >> NER Annotations are typically available as both > >> >> >> >> >> NlpAnnotations.NER_ANNOTATION and fise:TextAnnotation [1] > in > >> the > >> >> >> >> >> enhancement metadata. As you are already accessing the > >> >> >> >> >> AnayzedText I > >> >> >> >> >> would prefer using the NlpAnnotations.NER_ANNOTATION. > >> >> >> >> >> > >> >> >> >> >> best > >> >> >> >> >> Rupert > >> >> >> >> >> > >> >> >> >> >> [1] > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> >> > >> > http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation > >> >> >> >> >> > >> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca > >> >> >> >> >> <cristian.petro...@gmail.com> wrote: > >> >> >> >> >> > Thanks. > >> >> >> >> >> > I assume I should get the Named entities using the same > but > >> >> >> >> >> > with > >> >> >> >> >> > NlpAnnotations.NER_ANNOTATION? > >> >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler < > >> >> >> >> >> > rupert.westentha...@gmail.com>: > >> >> >> >> >> > > >> >> >> >> >> >> Hallo Cristian, > >> >> >> >> >> >> > >> >> >> >> >> >> NounPhrases are not added to the RDF enhancement results. > >> You > >> >> >> need to > >> >> >> >> >> >> use the AnalyzedText ContentPart [1] > >> >> >> >> >> >> > >> >> >> >> >> >> here is some demo code you can use in the > computeEnhancement > >> >> >> method > >> >> >> >> >> >> > >> >> >> >> >> >> AnalysedText at = > >> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this, > >> >> >> ci, > >> >> >> >> >> true); > >> >> >> >> >> >> Iterator<? extends Section> sections = > >> >> >> >> >> >> at.getSentences(); > >> >> >> >> >> >> if(!sections.hasNext()){ //process as single > >> sentence > >> >> >> >> >> >> sections = > Collections.singleton(at).iterator(); > >> >> >> >> >> >> } > >> >> >> >> >> >> > >> >> >> >> >> >> while(sections.hasNext()){ > >> >> >> >> >> >> Section section = sections.next(); > >> >> >> >> >> >> Iterator<Span> chunks = > >> >> >> >> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk)); > >> >> >> >> >> >> while(chunks.hasNext()){ > >> >> >> >> >> >> Span chunk = chunks.next(); > >> >> >> >> >> >> Value<PhraseTag> phrase = > >> >> >> >> >> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION); > >> >> >> >> >> >> if(phrase.value().getCategory() == > >> >> >> >> >> LexicalCategory.Noun){ > >> >> >> >> >> >> log.info(" - NounPhrase [{},{}] {}", > >> new > >> >> >> >> Object[]{ > >> >> >> >> >> >> > >> >> >> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()}); > >> >> >> >> >> >> } > >> >> >> >> >> >> } > >> >> >> >> >> >> } > >> >> >> >> >> >> > >> >> >> >> >> >> hope this helps > >> >> >> >> >> >> > >> >> >> >> >> >> best > >> >> >> >> >> >> Rupert > >> >> >> >> >> >> > >> >> >> >> >> >> [1] > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> >> > >> > http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext > >> >> >> >> >> >> > >> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca > >> >> >> >> >> >> <cristian.petro...@gmail.com> wrote: > >> >> >> >> >> >> > I started to implement the engine and I'm having > problems > >> >> >> >> >> >> > with > >> >> >> >> getting > >> >> >> >> >> >> > results for noun phrases. I modified the "default" > >> weighted > >> >> >> chain > >> >> >> >> to > >> >> >> >> >> also > >> >> >> >> >> >> > include the PosChunkerEngine and ran a sample text : > >> "Angela > >> >> >> Merkel > >> >> >> >> >> >> visted > >> >> >> >> >> >> > China. The german chancellor met with various people". > I > >> >> >> expected > >> >> >> >> that > >> >> >> >> >> >> the > >> >> >> >> >> >> > RDF XML output would contain some info about the noun > >> >> >> >> >> >> > phrases > >> >> >> but I > >> >> >> >> >> >> cannot > >> >> >> >> >> >> > see any. > >> >> >> >> >> >> > Could you point me to the correct way to generate the > noun > >> >> >> phrases? > >> >> >> >> >> >> > > >> >> >> >> >> >> > Thanks, > >> >> >> >> >> >> > Cristian > >> >> >> >> >> >> > > >> >> >> >> >> >> > > >> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca < > >> >> >> >> >> >> cristian.petro...@gmail.com>: > >> >> >> >> >> >> > > >> >> >> >> >> >> >> Opened > >> https://issues.apache.org/jira/browse/STANBOL-1279 > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca < > >> >> >> >> >> >> cristian.petro...@gmail.com> > >> >> >> >> >> >> >> : > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> Hi Rupert, > >> >> >> >> >> >> >>> > >> >> >> >> >> >> >>> The "spatial" dimension is a good idea. I'll also > take a > >> >> >> >> >> >> >>> look > >> >> >> at > >> >> >> >> >> Yago. > >> >> >> >> >> >> >>> > >> >> >> >> >> >> >>> I will create a Jira with what we talked about here. > It > >> >> >> >> >> >> >>> will > >> >> >> >> >> probably > >> >> >> >> >> >> >>> have just a draft-like description for now and will > be > >> >> >> >> >> >> >>> updated > >> >> >> >> as I > >> >> >> >> >> go > >> >> >> >> >> >> >>> along. > >> >> >> >> >> >> >>> > >> >> >> >> >> >> >>> Thanks, > >> >> >> >> >> >> >>> Cristian > >> >> >> >> >> >> >>> > >> >> >> >> >> >> >>> > >> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler < > >> >> >> >> >> >> >>> rupert.westentha...@gmail.com>: > >> >> >> >> >> >> >>> > >> >> >> >> >> >> >>> Hi Cristian, > >> >> >> >> >> >> >>>> > >> >> >> >> >> >> >>>> definitely an interesting approach. You should have > a > >> >> >> >> >> >> >>>> look at > >> >> >> >> Yago2 > >> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago taxonomy is > much > >> >> >> better > >> >> >> >> >> >> >>>> structured as the one used by dbpedia. Mapping > >> >> >> >> >> >> >>>> suggestions of > >> >> >> >> >> dbpedia > >> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and > yago2 > >> do > >> >> >> >> provide > >> >> >> >> >> >> >>>> mappings [2] and [3] > >> >> >> >> >> >> >>>> > >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro > >> >> >> >> >> >> >>>> > <rh...@apache.org>: > >> >> >> >> >> >> >>>> >> > >> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The > Redmond's > >> >> >> >> >> >> >>>> >> company > >> >> >> >> made > >> >> >> >> >> a > >> >> >> >> >> >> >>>> >> huge profit". > >> >> >> >> >> >> >>>> > >> >> >> >> >> >> >>>> Thats actually a very good example. Spatial contexts > >> are > >> >> >> >> >> >> >>>> very > >> >> >> >> >> >> >>>> important as they tend to be often used for > >> referencing. > >> >> >> >> >> >> >>>> So I > >> >> >> >> would > >> >> >> >> >> >> >>>> suggest to specially treat the spatial context. For > >> >> >> >> >> >> >>>> spatial > >> >> >> >> >> Entities > >> >> >> >> >> >> >>>> (like a City) this is easy, but even for other > (like a > >> >> >> Person, > >> >> >> >> >> >> >>>> Company) you could use relations to spatial entities > >> >> >> >> >> >> >>>> define > >> >> >> >> their > >> >> >> >> >> >> >>>> spatial context. This context could than be used to > >> >> >> >> >> >> >>>> correctly > >> >> >> >> link > >> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft". > >> >> >> >> >> >> >>>> > >> >> >> >> >> >> >>>> In addition I would suggest to use the "spatial" > >> context > >> >> >> >> >> >> >>>> of > >> >> >> each > >> >> >> >> >> >> >>>> entity (basically relation to entities that are > cities, > >> >> >> regions, > >> >> >> >> >> >> >>>> countries) as a separate dimension, because those > are > >> >> >> >> >> >> >>>> very > >> >> >> often > >> >> >> >> >> used > >> >> >> >> >> >> >>>> for coreferences. > >> >> >> >> >> >> >>>> > >> >> >> >> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/ > >> >> >> >> >> >> >>>> [2] > >> >> >> >> >> >> >>>> > >> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2 > >> >> >> >> >> >> >>>> [3] > >> >> >> >> >> >> >>>> > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> >> > >> > http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z > >> >> >> >> >> >> >>>> > >> >> >> >> >> >> >>>> > >> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca > >> >> >> >> >> >> >>>> <cristian.petro...@gmail.com> wrote: > >> >> >> >> >> >> >>>> > There are several dbpedia categories for each > entity, > >> >> >> >> >> >> >>>> > in > >> >> >> this > >> >> >> >> >> case > >> >> >> >> >> >> for > >> >> >> >> >> >> >>>> > Microsoft we have : > >> >> >> >> >> >> >>>> > > >> >> >> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index > >> >> >> >> >> >> >>>> > category:Microsoft > >> >> >> >> >> >> >>>> > category:Software_companies_of_the_United_States > >> >> >> >> >> >> >>>> > > >> category:Software_companies_based_in_Washington_(state) > >> >> >> >> >> >> >>>> > category:Companies_established_in_1975 > >> >> >> >> >> >> >>>> > category:1975_establishments_in_the_United_States > >> >> >> >> >> >> >>>> > category:Companies_based_in_Redmond,_Washington > >> >> >> >> >> >> >>>> > > >> >> >> >> >> > >> >> >> >> >> > >> category:Multinational_companies_headquartered_in_the_United_States > >> >> >> >> >> >> >>>> > category:Cloud_computing_providers > >> >> >> >> >> >> >>>> > > >> category:Companies_in_the_Dow_Jones_Industrial_Average > >> >> >> >> >> >> >>>> > > >> >> >> >> >> >> >>>> > So we also have "Companies based in > >> Redmont,Washington" > >> >> >> which > >> >> >> >> >> could > >> >> >> >> >> >> be > >> >> >> >> >> >> >>>> > matched. > >> >> >> >> >> >> >>>> > > >> >> >> >> >> >> >>>> > > >> >> >> >> >> >> >>>> > There is still other contextual information from > >> >> >> >> >> >> >>>> > dbpedia > >> >> >> which > >> >> >> >> >> can > >> >> >> >> >> >> be > >> >> >> >> >> >> >>>> used. > >> >> >> >> >> >> >>>> > For example for an Organization we could also > >> include : > >> >> >> >> >> >> >>>> > dbpprop:industry = Software > >> >> >> >> >> >> >>>> > dbpprop:service = Online Service Providers > >> >> >> >> >> >> >>>> > > >> >> >> >> >> >> >>>> > and for a Person (that's for Barack Obama) : > >> >> >> >> >> >> >>>> > > >> >> >> >> >> >> >>>> > dbpedia-owl:profession: > >> >> >> >> >> >> >>>> > dbpedia:Author > >> >> >> >> >> >> >>>> > > >> >> >> >> >> >> >>>> > dbpedia:Constitutional_law > >> >> >> >> >> >> >>>> > dbpedia:Lawyer > >> >> >> >> >> >> >>>> > > >> >> >> >> >> >> >>>> > dbpedia:Community_organizing > >> >> >> >> >> >> >>>> > > >> >> >> >> >> >> >>>> > I'd like to continue investigating this as I think > >> that > >> >> >> >> >> >> >>>> > it > >> >> >> may > >> >> >> >> >> have > >> >> >> >> >> >> >>>> some > >> >> >> >> >> >> >>>> > value in increasing the number of coreference > >> >> >> >> >> >> >>>> > resolutions > >> >> >> and > >> >> >> >> I'd > >> >> >> >> >> >> like > >> >> >> >> >> >> >>>> to > >> >> >> >> >> >> >>>> > concentrate more on precision rather than recall > >> since > >> >> >> >> >> >> >>>> > we > >> >> >> >> already > >> >> >> >> >> >> have > >> >> >> >> >> >> >>>> a > >> >> >> >> >> >> >>>> > set of coreferences detected by the stanford nlp > tool > >> >> >> >> >> >> >>>> > and > >> >> >> this > >> >> >> >> >> would > >> >> >> >> >> >> >>>> be as > >> >> >> >> >> >> >>>> > an addition to that (at least this is how I would > >> like > >> >> >> >> >> >> >>>> > to > >> >> >> use > >> >> >> >> >> it). > >> >> >> >> >> >> >>>> > > >> >> >> >> >> >> >>>> > Is it ok if I track this by opening a jira? I > could > >> >> >> >> >> >> >>>> > update > >> >> >> it > >> >> >> >> to > >> >> >> >> >> >> show > >> >> >> >> >> >> >>>> my > >> >> >> >> >> >> >>>> > progress and also my conclusions and if it turns > out > >> >> >> >> >> >> >>>> > that > >> >> >> it > >> >> >> >> was > >> >> >> >> >> a > >> >> >> >> >> >> bad > >> >> >> >> >> >> >>>> idea > >> >> >> >> >> >> >>>> > then that's the situation at least I'll end up > with > >> >> >> >> >> >> >>>> > more > >> >> >> >> >> knowledge > >> >> >> >> >> >> >>>> about > >> >> >> >> >> >> >>>> > Stanbol in the end :). > >> >> >> >> >> >> >>>> > > >> >> >> >> >> >> >>>> > > >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro > >> >> >> >> >> >> >>>> > <rh...@apache.org>: > >> >> >> >> >> >> >>>> > > >> >> >> >> >> >> >>>> >> Hi Cristian, > >> >> >> >> >> >> >>>> >> > >> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want to be the > >> >> >> >> >> >> >>>> >> devil's > >> >> >> >> >> advocate > >> >> >> >> >> >> but > >> >> >> >> >> >> >>>> I'm > >> >> >> >> >> >> >>>> >> just not sure about the recall using the dbpedia > >> >> >> categories > >> >> >> >> >> >> feature. > >> >> >> >> >> >> >>>> For > >> >> >> >> >> >> >>>> >> example, your sentence could be also "Microsoft > >> posted > >> >> >> >> >> >> >>>> >> its > >> >> >> >> 2013 > >> >> >> >> >> >> >>>> earnings. > >> >> >> >> >> >> >>>> >> The Redmond's company made a huge profit". So, > maybe > >> >> >> >> including > >> >> >> >> >> more > >> >> >> >> >> >> >>>> >> contextual information from dbpedia could > increase > >> the > >> >> >> recall > >> >> >> >> >> but > >> >> >> >> >> >> of > >> >> >> >> >> >> >>>> course > >> >> >> >> >> >> >>>> >> will reduce the precision. > >> >> >> >> >> >> >>>> >> > >> >> >> >> >> >> >>>> >> Cheers, > >> >> >> >> >> >> >>>> >> Rafa > >> >> >> >> >> >> >>>> >> > >> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió: > >> >> >> >> >> >> >>>> >> > >> >> >> >> >> >> >>>> >> Back with a more detailed description of the > steps > >> >> >> >> >> >> >>>> >> for > >> >> >> >> making > >> >> >> >> >> this > >> >> >> >> >> >> >>>> kind of > >> >> >> >> >> >> >>>> >>> coreference work. > >> >> >> >> >> >> >>>> >>> > >> >> >> >> >> >> >>>> >>> I will be using references to the following > text in > >> >> >> >> >> >> >>>> >>> the > >> >> >> >> steps > >> >> >> >> >> >> below > >> >> >> >> >> >> >>>> in > >> >> >> >> >> >> >>>> >>> order to make things clearer : "Microsoft posted > >> its > >> >> >> >> >> >> >>>> >>> 2013 > >> >> >> >> >> >> earnings. > >> >> >> >> >> >> >>>> The > >> >> >> >> >> >> >>>> >>> software company made a huge profit." > >> >> >> >> >> >> >>>> >>> > >> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text which has : > >> >> >> >> >> >> >>>> >>> a. a determinate pos which implies > reference > >> to > >> >> >> >> >> >> >>>> >>> an > >> >> >> >> entity > >> >> >> >> >> >> local > >> >> >> >> >> >> >>>> to > >> >> >> >> >> >> >>>> >>> the > >> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but not > "another, > >> >> >> every", > >> >> >> >> etc > >> >> >> >> >> >> which > >> >> >> >> >> >> >>>> >>> implies a reference to an entity outside of the > >> text. > >> >> >> >> >> >> >>>> >>> b. having at least another noun aside from > the > >> >> >> >> >> >> >>>> >>> main > >> >> >> >> >> required > >> >> >> >> >> >> >>>> noun > >> >> >> >> >> >> >>>> >>> which > >> >> >> >> >> >> >>>> >>> further describes it. For example I will not > count > >> >> >> >> >> >> >>>> >>> "The > >> >> >> >> >> company" > >> >> >> >> >> >> as > >> >> >> >> >> >> >>>> being > >> >> >> >> >> >> >>>> >>> a > >> >> >> >> >> >> >>>> >>> legitimate candidate since this could create a > lot > >> of > >> >> >> false > >> >> >> >> >> >> >>>> positives by > >> >> >> >> >> >> >>>> >>> considering the double meaning of some words > such > >> as > >> >> >> >> >> >> >>>> >>> "in > >> >> >> the > >> >> >> >> >> >> company > >> >> >> >> >> >> >>>> of > >> >> >> >> >> >> >>>> >>> good people". > >> >> >> >> >> >> >>>> >>> "The software company" is a good candidate > since we > >> >> >> >> >> >> >>>> >>> also > >> >> >> >> have > >> >> >> >> >> >> >>>> "software". > >> >> >> >> >> >> >>>> >>> > >> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to the > >> contents > >> >> >> >> >> >> >>>> >>> of > >> >> >> the > >> >> >> >> >> >> dbpedia > >> >> >> >> >> >> >>>> >>> categories of each named entity found prior to > the > >> >> >> location > >> >> >> >> of > >> >> >> >> >> the > >> >> >> >> >> >> >>>> noun > >> >> >> >> >> >> >>>> >>> phrase in the text. > >> >> >> >> >> >> >>>> >>> The dbpedia categories are in the following > format > >> >> >> >> >> >> >>>> >>> (for > >> >> >> >> >> Microsoft > >> >> >> >> >> >> for > >> >> >> >> >> >> >>>> >>> example) : "Software companies of the United > >> States". > >> >> >> >> >> >> >>>> >>> So we try to match "software company" with > that. > >> >> >> >> >> >> >>>> >>> First, as you can see, the main noun in the > dbpedia > >> >> >> category > >> >> >> >> >> has a > >> >> >> >> >> >> >>>> plural > >> >> >> >> >> >> >>>> >>> form and it's the same for all categories which > I > >> >> >> >> >> >> >>>> >>> saw. I > >> >> >> >> don't > >> >> >> >> >> >> know > >> >> >> >> >> >> >>>> if > >> >> >> >> >> >> >>>> >>> there's an easier way to do this but I thought > of > >> >> >> applying a > >> >> >> >> >> >> >>>> lemmatizer on > >> >> >> >> >> >> >>>> >>> the category and the noun phrase in order for > them > >> to > >> >> >> have a > >> >> >> >> >> >> common > >> >> >> >> >> >> >>>> >>> denominator.This also works if the noun phrase > >> itself > >> >> >> has a > >> >> >> >> >> plural > >> >> >> >> >> >> >>>> form. > >> >> >> >> >> >> >>>> >>> > >> >> >> >> >> >> >>>> >>> Second, I'll need to use for comparison only the > >> >> >> >> >> >> >>>> >>> words in > >> >> >> >> the > >> >> >> >> >> >> >>>> category > >> >> >> >> >> >> >>>> >>> which are themselves nouns and not prepositions > or > >> >> >> >> determiners > >> >> >> >> >> >> such > >> >> >> >> >> >> >>>> as "of > >> >> >> >> >> >> >>>> >>> the".This means that I need to pos tag the > >> categories > >> >> >> >> contents > >> >> >> >> >> as > >> >> >> >> >> >> >>>> well. > >> >> >> >> >> >> >>>> >>> I was thinking of running the pos and lemma on > the > >> >> >> dbpedia > >> >> >> >> >> >> >>>> categories when > >> >> >> >> >> >> >>>> >>> building the dbpedia backed entity hub and > storing > >> >> >> >> >> >> >>>> >>> them > >> >> >> for > >> >> >> >> >> later > >> >> >> >> >> >> >>>> use - I > >> >> >> >> >> >> >>>> >>> don't know how feasible this is at the moment. > >> >> >> >> >> >> >>>> >>> > >> >> >> >> >> >> >>>> >>> After this I can compare each noun in the noun > >> phrase > >> >> >> with > >> >> >> >> the > >> >> >> >> >> >> >>>> equivalent > >> >> >> >> >> >> >>>> >>> nouns in the categories and based on the number > of > >> >> >> matches I > >> >> >> >> >> can > >> >> >> >> >> >> >>>> create a > >> >> >> >> >> >> >>>> >>> confidence level. > >> >> >> >> >> >> >>>> >>> > >> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase with the > >> >> >> >> >> >> >>>> >>> rdf:type > >> >> >> from > >> >> >> >> >> >> dbpedia > >> >> >> >> >> >> >>>> of the > >> >> >> >> >> >> >>>> >>> named entity. If this matches increase the > >> confidence > >> >> >> level. > >> >> >> >> >> >> >>>> >>> > >> >> >> >> >> >> >>>> >>> 4. If there are multiple named entities which > can > >> >> >> >> >> >> >>>> >>> match a > >> >> >> >> >> certain > >> >> >> >> >> >> >>>> noun > >> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with the > closest > >> >> >> >> >> >> >>>> >>> named > >> >> >> >> entity > >> >> >> >> >> >> prior > >> >> >> >> >> >> >>>> to it > >> >> >> >> >> >> >>>> >>> in the text. > >> >> >> >> >> >> >>>> >>> > >> >> >> >> >> >> >>>> >>> What do you think? > >> >> >> >> >> >> >>>> >>> > >> >> >> >> >> >> >>>> >>> Cristian > >> >> >> >> >> >> >>>> >>> > >> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca < > >> >> >> cristian.petro...@gmail.com>: > >> >> >> >> >> >> >>>> >>> > >> >> >> >> >> >> >>>> >>> Hi Rafa, > >> >> >> >> >> >> >>>> >>>> > >> >> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic but I'm > >> >> >> >> >> >> >>>> >>>> working on > >> >> >> >> it. > >> >> >> >> >> I'll > >> >> >> >> >> >> >>>> provide > >> >> >> >> >> >> >>>> >>>> it here so that you guys can give me a > feedback on > >> >> >> >> >> >> >>>> >>>> it. > >> >> >> >> >> >> >>>> >>>> > >> >> >> >> >> >> >>>> >>>> What are "locality" features? > >> >> >> >> >> >> >>>> >>>> > >> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools such as > >> >> >> >> >> >> >>>> >>>> ArkRef > >> >> >> and > >> >> >> >> >> >> >>>> CherryPicker > >> >> >> >> >> >> >>>> >>>> and > >> >> >> >> >> >> >>>> >>>> they don't provide such a coreference. > >> >> >> >> >> >> >>>> >>>> > >> >> >> >> >> >> >>>> >>>> Cristian > >> >> >> >> >> >> >>>> >>>> > >> >> >> >> >> >> >>>> >>>> > >> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>: > >> >> >> >> >> >> >>>> >>>> > >> >> >> >> >> >> >>>> >>>> Hi Cristian, > >> >> >> >> >> >> >>>> >>>> > >> >> >> >> >> >> >>>> >>>>> Without having more details about your > concrete > >> >> >> heuristic, > >> >> >> >> >> in my > >> >> >> >> >> >> >>>> honest > >> >> >> >> >> >> >>>> >>>>> opinion, such approach could produce a lot of > >> false > >> >> >> >> >> positives. I > >> >> >> >> >> >> >>>> don't > >> >> >> >> >> >> >>>> >>>>> know > >> >> >> >> >> >> >>>> >>>>> if you are planning to use some "locality" > >> features > >> >> >> >> >> >> >>>> >>>>> to > >> >> >> >> detect > >> >> >> >> >> >> such > >> >> >> >> >> >> >>>> >>>>> coreferences but you need to take into account > >> that > >> >> >> >> >> >> >>>> >>>>> it > >> >> >> is > >> >> >> >> >> quite > >> >> >> >> >> >> >>>> usual > >> >> >> >> >> >> >>>> >>>>> that > >> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs even in > >> different > >> >> >> >> >> paragraphs. > >> >> >> >> >> >> >>>> Although > >> >> >> >> >> >> >>>> >>>>> I'm > >> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language > Understanding, > >> I > >> >> >> would > >> >> >> >> say > >> >> >> >> >> it > >> >> >> >> >> >> is > >> >> >> >> >> >> >>>> quite > >> >> >> >> >> >> >>>> >>>>> difficult to get decent precision/recall rates > >> for > >> >> >> >> >> coreferencing > >> >> >> >> >> >> >>>> using > >> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to > others > >> >> >> >> >> >> >>>> >>>>> tools > >> >> >> like > >> >> >> >> >> BART > >> >> >> >> >> >> ( > >> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/). > >> >> >> >> >> >> >>>> >>>>> > >> >> >> >> >> >> >>>> >>>>> Cheers, > >> >> >> >> >> >> >>>> >>>>> Rafa Haro > >> >> >> >> >> >> >>>> >>>>> > >> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió: > >> >> >> >> >> >> >>>> >>>>> > >> >> >> >> >> >> >>>> >>>>> Hi, > >> >> >> >> >> >> >>>> >>>>> > >> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for implementing > the > >> >> >> >> >> >> >>>> >>>>>> Event > >> >> >> >> >> >> extraction > >> >> >> >> >> >> >>>> Engine > >> >> >> >> >> >> >>>> >>>>>> feature : > >> >> >> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is > >> >> >> >> >> >> >>>> to > >> >> >> >> >> >> >>>> >>>>>> have > >> >> >> >> >> >> >>>> >>>>>> coreference resolution in the given text. > This > >> is > >> >> >> >> provided > >> >> >> >> >> now > >> >> >> >> >> >> >>>> via the > >> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw this > >> >> >> >> >> >> >>>> >>>>>> module > >> >> >> is > >> >> >> >> >> >> performing > >> >> >> >> >> >> >>>> >>>>>> mostly > >> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama > and > >> >> >> >> >> >> >>>> >>>>>> Mr. > >> >> >> >> Obama) > >> >> >> >> >> >> >>>> coreference > >> >> >> >> >> >> >>>> >>>>>> resolution. > >> >> >> >> >> >> >>>> >>>>>> > >> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences from the > text > >> I > >> >> >> though > >> >> >> >> of > >> >> >> >> >> >> >>>> creating > >> >> >> >> >> >> >>>> >>>>>> some > >> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind of > >> coreference : > >> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The > software > >> >> >> company > >> >> >> >> just > >> >> >> >> >> >> >>>> announced > >> >> >> >> >> >> >>>> >>>>>> its > >> >> >> >> >> >> >>>> >>>>>> 2013 earnings." > >> >> >> >> >> >> >>>> >>>>>> Here "The software company" obviously refers > to > >> >> >> "Apple". > >> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of Named > >> >> >> >> >> >> >>>> >>>>>> Entities > >> >> >> >> which > >> >> >> >> >> are > >> >> >> >> >> >> of > >> >> >> >> >> >> >>>> the > >> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this case > >> >> >> >> >> >> >>>> >>>>>> "company" > >> >> >> and > >> >> >> >> >> also > >> >> >> >> >> >> >>>> have > >> >> >> >> >> >> >>>> >>>>>> attributes which can be found in the dbpedia > >> >> >> categories > >> >> >> >> of > >> >> >> >> >> the > >> >> >> >> >> >> >>>> named > >> >> >> >> >> >> >>>> >>>>>> entity, in this case "software". > >> >> >> >> >> >> >>>> >>>>>> > >> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such as "The > >> >> >> >> >> >> >>>> >>>>>> software > >> >> >> >> >> company" in > >> >> >> >> >> >> >>>> the > >> >> >> >> >> >> >>>> >>>>>> text > >> >> >> >> >> >> >>>> >>>>>> would also be done by either using the new > Pos > >> Tag > >> >> >> Based > >> >> >> >> >> Phrase > >> >> >> >> >> >> >>>> >>>>>> extraction > >> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a > dependency > >> >> >> >> >> >> >>>> >>>>>> tree of > >> >> >> >> the > >> >> >> >> >> >> >>>> sentence and > >> >> >> >> >> >> >>>> >>>>>> picking up only subjects or objects. > >> >> >> >> >> >> >>>> >>>>>> > >> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if this kind > of > >> >> >> >> >> >> >>>> >>>>>> logic > >> >> >> >> would > >> >> >> >> >> be > >> >> >> >> >> >> >>>> useful > >> >> >> >> >> >> >>>> >>>>>> as a > >> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case the > >> precision > >> >> >> >> >> >> >>>> >>>>>> and > >> >> >> >> >> recall > >> >> >> >> >> >> are > >> >> >> >> >> >> >>>> good > >> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol? > >> >> >> >> >> >> >>>> >>>>>> > >> >> >> >> >> >> >>>> >>>>>> Thanks, > >> >> >> >> >> >> >>>> >>>>>> Cristian > >> >> >> >> >> >> >>>> >>>>>> > >> >> >> >> >> >> >>>> >>>>>> > >> >> >> >> >> >> >>>> >>>>>> > >> >> >> >> >> >> >>>> >> > >> >> >> >> >> >> >>>> > >> >> >> >> >> >> >>>> > >> >> >> >> >> >> >>>> > >> >> >> >> >> >> >>>> -- > >> >> >> >> >> >> >>>> | Rupert Westenthaler > >> >> >> rupert.westentha...@gmail.com > >> >> >> >> >> >> >>>> | Bodenlehenstraße 11 > >> >> >> >> >> ++43-699-11108907 > >> >> >> >> >> >> >>>> | A-5500 Bischofshofen > >> >> >> >> >> >> >>>> > >> >> >> >> >> >> >>> > >> >> >> >> >> >> >>> > >> >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> -- > >> >> >> >> >> >> | Rupert Westenthaler > >> >> >> >> >> >> rupert.westentha...@gmail.com > >> >> >> >> >> >> | Bodenlehenstraße 11 > >> >> >> ++43-699-11108907 > >> >> >> >> >> >> | A-5500 Bischofshofen > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> -- > >> >> >> >> >> | Rupert Westenthaler > >> rupert.westentha...@gmail.com > >> >> >> >> >> | Bodenlehenstraße 11 > >> >> >> >> >> ++43-699-11108907 > >> >> >> >> >> | A-5500 Bischofshofen > >> >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> -- > >> >> >> >> | Rupert Westenthaler > rupert.westentha...@gmail.com > >> >> >> >> | Bodenlehenstraße 11 > >> ++43-699-11108907 > >> >> >> >> | A-5500 Bischofshofen > >> >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> | Rupert Westenthaler rupert.westentha...@gmail.com > >> >> >> | Bodenlehenstraße 11 > ++43-699-11108907 > >> >> >> | A-5500 Bischofshofen > >> >> >> > >> >> > >> >> > >> >> > >> >> -- > >> >> | Rupert Westenthaler rupert.westentha...@gmail.com > >> >> | Bodenlehenstraße 11 ++43-699-11108907 > >> >> | A-5500 Bischofshofen > >> > > >> > > >> > >> > >> > >> -- > >> | Rupert Westenthaler rupert.westentha...@gmail.com > >> | Bodenlehenstraße 11 ++43-699-11108907 > >> | A-5500 Bischofshofen > >> > > > > -- > | Rupert Westenthaler rupert.westentha...@gmail.com > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen >