On Mon, Jun 3, 2013 at 3:32 PM, Mohammad Benslimne
<mohammad.benslim...@gmail.com> wrote:
> Thank you Rafa and Rupert for your responses.
>
> I have more queries :
> - Spans annotations can be overlaped interlocked?

Yes and this can easily happen if you have several NLP frameworks that
do process the same text. The AnalysedText ContentPart [1] just
assures that you can not add the same Span ( same start, end , span
type) twice. If you try to do that it will return the existing
instance instead.

> - The concept-uri can be any custom uri? Or it should define a schema?

You can use any URI. Stanbol does not really care about the used
schema, but users might. So I would suggest to use meaningful URIs. If
possible you should use URLs that are de-referenceable (like
dbpedia.org/resource/Paris, http://sws.geonames.org/3324203/,
http://rdf.freebase.com/ns/m.04dz3p6)

best
Rupert

[1] http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext

>
>
> Advance thanks
> Mohammad
>
>
>
> On 3 June 2013 03:10, Rupert Westenthaler 
> <rupert.westentha...@gmail.com>wrote:
>
>> Hi Mohammed,
>>
>> As Rafa stated, for the "OpenNLP Custom NER Model" you need to train
>> your own OpenNLP NER model. If you have done that just copy the model
>> to the 'stanbol/datafiles' directory. After that you can configure the
>> "OpenNLP Custom NER Model" engine by providing
>>
>> * a name
>> * the name of model file (in the  'stanbol/datafiles' directory)
>> * the type mappings ( {ner-type} > {concept-uri}). Where {ner-type} is
>> the name of the entities in the training set - the <START:{ner-type}>
>> <END> annotations. The {concept-uri} is the URI used as value for the
>> dc:type properties added to fise:TextAnnotations
>>
>> best
>> Rupert
>>
>>
>> On Thu, May 30, 2013 at 7:39 PM, Rafa Haro <rh...@zaizi.com> wrote:
>> > Hi Mohammad,
>> >
>> > Maybe your question is more suitable for OpenNLP mail list but I can try
>> to
>> > help you. First you need to clarify if you want to build a document
>> > classifier or an enhancer, because maybe a document classifier doesn't
>> > really fit what an enhancement mean in Stanbol.
>> >
>> > If you want to build your custom "concept" or Named entity recognition
>> > engine, you have some different options. Maybwe the easiest one is to
>> train
>> > your custom OpenNLP NER model and then integrate it in an engine in
>> > Stanbol. You can follow OpenNLP documentation for that [1]. You would
>> need
>> > some custom training data for your problem domain.
>> >
>> > In the other hand, if you have your own dataset or vocabulary and you
>> want
>> > to link surface forms or concept mentions in text with such dataset, you
>> > should create an EntityHub site for your data an configure a new Entity
>> > Linking engine. You can then also follow a quite helpful guide at Stanbol
>> > website [2].
>> >
>> > I hope these two links are useful for your first steps.
>> >
>> > Cheers
>> >
>> > [1] -
>> >
>> http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind
>> > [2] - https://stanbol.apache.org/docs/trunk/customvocabulary.html
>> >
>> > El jueves, 30 de mayo de 2013, Mohammad Benslimne escribió:
>> >
>> >> Hello folks,
>> >>
>> >> I am developping for my undergraduate project a document
>> >> classifier/extractor.
>> >> I would like use your tools, espacially the OpenNLP Custom NER Model
>> >> extraction engine to define what kind of data to extract.
>> >> Can you please fill me examples how to make it woking out?
>> >> How can I make my own name Finder models and type mapping?
>> >>
>> >> Thanks in advance for your precious hints
>> >>
>> >>
>> >> Regards,
>> >> Med
>> >>
>> >
>> > --
>> >
>> > ------------------------------
>> > This message should be regarded as confidential. If you have received
>> this
>> > email in error please notify the sender and destroy it immediately.
>> > Statements of intent shall only become binding when confirmed in hard
>> copy
>> > by an authorised signatory.
>> >
>> > Zaizi Ltd is registered in England and Wales with the registration number
>> > 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam
>> Road,
>> > London W10 5JJ, UK.
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westentha...@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>
>
> --
>
>
> Mohammad



--
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to