Re: problem with entity recognition or linking in french

Rupert Westenthaler Thu, 18 Apr 2013 06:56:02 -0700

On Thu, Apr 18, 2013 at 3:16 PM, Joseph M'Bimbi-Bene
<[email protected]> wrote:
> I don't see the option, can you give me the procedure or a more precise
> indication please ?
>


If you do not want to use POS tagging, than the options are limited:

* uc {NONE/MATCH/LINK}::string - the Upper Case Token Mode allows to
configure how upper case words are treated. There are three possible
modes: (1) NONE: defines that they are not specially treated; (2)
MATCH defines that they are considered as matchable tokens
(independent of the POS tag or the token length; (3) LINK: defines
that they are in any case linked with the vocabulary. The default is
"LINK" - as upper case words often represent named entities - with the
exception of German ('de') where the mode is set to MATCH - as all
Nouns in German are upper case.

e.g.

org.apache.stanbol.enhancer.engines.keywordextraction.processedLanguages=["fr;uc\=MATCH"]
enhancer.engines.linking.minSearchTokenLength=3

This would MATCH all upper case and words with three or more chars.

However if you vocabulary does contain Entities that would appear in
texts as specific POS (e.g. Nouns) I would really recommend you to
give POS tagging a try.

If you like you can try to process some of your texts using the

* DBpedia proper noun linking on
http://dev.iks-project.eu:8081/enhancer/chain/dbpedia-proper-noun
* Freebase proper noun linking currently running in an early test
version on http://dev.iks-project.eu:8083/enhancer/chain/freebase-proper-noun

both chains do use the talismane integration [1] for NLP processing

best
Rupert

> best
> Rupert
>
>
> [1] https://github.com/westei/stanbol-talismane
> [2] http://dev.iks-project.eu:8081/enhancer/chain/NIF-demo
> [3]
> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking#linking-process
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen



--
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: problem with entity recognition or linking in french

Reply via email to