Re: problem with entity recognition or linking in french

Joseph M'Bimbi-Bene Thu, 18 Apr 2013 05:57:53 -0700

Hello, thank you for your fast answer
I use langdetect + opennlp-token + myEngine. I know french is not supported
and use default tokenizer on purpose. Since i'm using a custom vocabulary
and some of the labels of my entities are on several words and i also want
to detect scribal abbreviation (thanks you wikipedia), i dont' want want
any chunking or phrase segmentation, and i think POS tagging is just going
to annoy me since i have no idea what POS tag will be afftected to some of
the labels



2013/4/18 Rupert Westenthaler <[email protected]>

> Hi Joseph,
>
> What engines do you use for NLP processing of French texts? OpenNLP
> has no models for French, so if you just configure those engines you
> will have tokens, but no detected Sentences, POS tags nor NER
> annotations. In this case the EntityhubLinkingEngine falls back to
> linking all Tokens of the Text that do have >= "Min Search Token
> Length" (default = 3) with the text. So assuming that your
> configuration of the EnhancementChain is like described "plombier" and
> "moustachu" should be linked with the vocabulary.
>
> BTW: If you are interested in processing French texts with Stanbol you
> should consider to use the Stanbol Talismane integration [1]
>
> Problems can also arise with very short texts (1) because the language
> might not be correctly detected and (2) POS and NER annotations do not
> work very well in such scenarios. So please check what language was
> detected for your input. If it was one of classified as one of the
> supported one (e.g. pt) you might also get unexpected results.
>
> Regarding the matching of skos:altLabel: The EntityhubLinkingEngine
> links only to a single field. By default this is set to rdfs:label. If
> you want to match against both skos:prefLabel and skos:altLabel, than
> there are two possibilities (1) copy the values of both skos:prefLabel
> and skos:altLabel to rdfs:label and configure rdfs:label for the
> engine (2) configure two instances of the EnityhubLinkingEngine: one
> for skos:prefLabel and the other for the skos:altLabel.
>
> If you want to know what happens ...
>
> (1) you can configure a Logger configuration to set the logger level
> for "org.apache.stanbol.enhancer.engines.entitylinking" to DEBUG. For
> that go the the "Configuration" tab of the Felix Web Console and add a
> new "Apache Sling Logging Logger Configuration". In DEBUG level the
> detailed information about the linking process are printed to the log.
>
> (2) if you want detailed information about the NLP processing results
> to be added to the enhancement results you can add the nlp2rdf
> enhancement engine to your Stanbol instance and your enhancement
> chain. For that you first need to install the bundle of this engine to
> the Stanbol environment (e.g. by using the Bundles tab of the Felix
> Webconsole) and after that add the engine to your chain configuration.
> This Engine does write detailed information about the NLP processing
> results. You can test it on [2]
>
>
> On Wed, Apr 17, 2013 at 4:16 PM, Joseph M'Bimbi-Bene
> <[email protected]> wrote:
> > Also, why is "le plombier moustachu" recognized ? why is there a
> difference
> > ?
>
> No Idea. Maybe the detected language does change by adding a word.
>
> >
> > Another related question is: what is the pos type of a token when i
> > deactivate the POStagging ?
>
> Than there are simple no POS annotations and the length of the words
> is used to decide if they are linked or not. Note that regardless of
> that upper case words do always trigger searches in the linked
> vocabulary.
>
> >  Are they all proper noun ? what happens ? how can i parameter that ?
>
> The EntityLinking engine distinguishes
>
> * Linkable Tokens: This are words that are linked with the Vocabulary.
> This means that the engine will issue quires in the controlled
> vocabulary for those tokens
> * Matchable Tokens: Matchable tokens are used to refine quires. For
> the matching of entity labels with the text those words are treated in
> the same way as linkable words. So the main difference is that
> matchable words alone will not cause the engine to query for Entities
> in the Controlled Vocabulary.
> * Other Tokens: All other tokens in the text are not used for searches
> in the configured vocabulary. However during the matching of labels
> with the Text they are considered as they might also be present in
> labels of entities
>
> The rules for classifying words as Linkable and Matchable can be
> controlled by the configuration of the EntiyLinkingEngine. You can
> find details about that in the documentation at [3]
>
> best
> Rupert
>
>
> [1] https://github.com/westei/stanbol-talismane
> [2] http://dev.iks-project.eu:8081/enhancer/chain/NIF-demo
> [3]
> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking#linking-process
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: problem with entity recognition or linking in french

Reply via email to