On Mon, Jun 3, 2013 at 3:32 PM, Mohammad Benslimne <mohammad.benslim...@gmail.com> wrote: > Thank you Rafa and Rupert for your responses. > > I have more queries : > - Spans annotations can be overlaped interlocked?
Yes and this can easily happen if you have several NLP frameworks that do process the same text. The AnalysedText ContentPart [1] just assures that you can not add the same Span ( same start, end , span type) twice. If you try to do that it will return the existing instance instead. > - The concept-uri can be any custom uri? Or it should define a schema? You can use any URI. Stanbol does not really care about the used schema, but users might. So I would suggest to use meaningful URIs. If possible you should use URLs that are de-referenceable (like dbpedia.org/resource/Paris, http://sws.geonames.org/3324203/, http://rdf.freebase.com/ns/m.04dz3p6) best Rupert [1] http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext > > > Advance thanks > Mohammad > > > > On 3 June 2013 03:10, Rupert Westenthaler > <rupert.westentha...@gmail.com>wrote: > >> Hi Mohammed, >> >> As Rafa stated, for the "OpenNLP Custom NER Model" you need to train >> your own OpenNLP NER model. If you have done that just copy the model >> to the 'stanbol/datafiles' directory. After that you can configure the >> "OpenNLP Custom NER Model" engine by providing >> >> * a name >> * the name of model file (in the 'stanbol/datafiles' directory) >> * the type mappings ( {ner-type} > {concept-uri}). Where {ner-type} is >> the name of the entities in the training set - the <START:{ner-type}> >> <END> annotations. The {concept-uri} is the URI used as value for the >> dc:type properties added to fise:TextAnnotations >> >> best >> Rupert >> >> >> On Thu, May 30, 2013 at 7:39 PM, Rafa Haro <rh...@zaizi.com> wrote: >> > Hi Mohammad, >> > >> > Maybe your question is more suitable for OpenNLP mail list but I can try >> to >> > help you. First you need to clarify if you want to build a document >> > classifier or an enhancer, because maybe a document classifier doesn't >> > really fit what an enhancement mean in Stanbol. >> > >> > If you want to build your custom "concept" or Named entity recognition >> > engine, you have some different options. Maybwe the easiest one is to >> train >> > your custom OpenNLP NER model and then integrate it in an engine in >> > Stanbol. You can follow OpenNLP documentation for that [1]. You would >> need >> > some custom training data for your problem domain. >> > >> > In the other hand, if you have your own dataset or vocabulary and you >> want >> > to link surface forms or concept mentions in text with such dataset, you >> > should create an EntityHub site for your data an configure a new Entity >> > Linking engine. You can then also follow a quite helpful guide at Stanbol >> > website [2]. >> > >> > I hope these two links are useful for your first steps. >> > >> > Cheers >> > >> > [1] - >> > >> http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind >> > [2] - https://stanbol.apache.org/docs/trunk/customvocabulary.html >> > >> > El jueves, 30 de mayo de 2013, Mohammad Benslimne escribió: >> > >> >> Hello folks, >> >> >> >> I am developping for my undergraduate project a document >> >> classifier/extractor. >> >> I would like use your tools, espacially the OpenNLP Custom NER Model >> >> extraction engine to define what kind of data to extract. >> >> Can you please fill me examples how to make it woking out? >> >> How can I make my own name Finder models and type mapping? >> >> >> >> Thanks in advance for your precious hints >> >> >> >> >> >> Regards, >> >> Med >> >> >> > >> > -- >> > >> > ------------------------------ >> > This message should be regarded as confidential. If you have received >> this >> > email in error please notify the sender and destroy it immediately. >> > Statements of intent shall only become binding when confirmed in hard >> copy >> > by an authorised signatory. >> > >> > Zaizi Ltd is registered in England and Wales with the registration number >> > 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam >> Road, >> > London W10 5JJ, UK. >> >> >> >> -- >> | Rupert Westenthaler rupert.westentha...@gmail.com >> | Bodenlehenstraße 11 ++43-699-11108907 >> | A-5500 Bischofshofen >> > > > > -- > > > Mohammad -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen