[ 
https://issues.apache.org/jira/browse/STANBOL-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497642#comment-14497642
 ] 

Rupert Westenthaler commented on STANBOL-1417:
----------------------------------------------

First a request with no Content-Language to a chain that only contains the 
language detection engine.
{code}
curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \
>      --data "The Stanbol enhancer can detect famous cities such as \
>              Paris and people such as Bob Marley." 
> http://localhost:8088/enhancer/chain/STANBOL-1417-test
<urn:enhancement-785913f9-6716-9530-2098-5a3fb8ec8463>
      a       <http://fise.iks-project.eu/ontology/TextAnnotation> , 
<http://fise.iks-project.eu/ontology/Enhancement> ;
      <http://fise.iks-project.eu/ontology/confidence>
              "0.9999978087054073"^^<http://www.w3.org/2001/XMLSchema#double> ;
      <http://fise.iks-project.eu/ontology/extracted-from>
              <urn:content-item-sha1-7c31e64955afb9f4a09e72075ac48125de156c94> ;
      <http://purl.org/dc/terms/created>
              
"2015-04-16T06:45:29.185Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
      <http://purl.org/dc/terms/creator>
              
"org.apache.stanbol.enhancer.engines.langdetect.LanguageDetectionEnhancementEngine"^^<http://www.w3.org/2001/XMLSchema#string>
 ;
      <http://purl.org/dc/terms/language>
              "en" ;
      <http://purl.org/dc/terms/type>
              <http://purl.org/dc/terms/LinguisticSystem> .
{code}

The response contains a single Language Annotation for English with a 
confidence of {{0.9999987}}

To demonstrate this feature in the next request we explicitly parse the 
{{Content-Language: de}} (for German) in the request. 

{code}
curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" -H 
"Content-Language: de" \
>        --data "The Stanbol enhancer can detect famous cities such as \
>             Paris and people such as Bob Marley." 
> http://localhost:8088/enhancer/chain/STANBOL-1417-test
<urn:enhancement-29accc13-832f-e14c-73e5-a17484787a4d>
      a       <http://fise.iks-project.eu/ontology/TextAnnotation> , 
<http://fise.iks-project.eu/ontology/Enhancement> ;
      <http://fise.iks-project.eu/ontology/confidence>
              "0.999995337655101"^^<http://www.w3.org/2001/XMLSchema#double> ;
      <http://fise.iks-project.eu/ontology/extracted-from>
              <urn:content-item-sha1-7c31e64955afb9f4a09e72075ac48125de156c94> ;
      <http://purl.org/dc/terms/created>
              
"2015-04-16T06:45:52.291Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
      <http://purl.org/dc/terms/creator>
              
"org.apache.stanbol.enhancer.engines.langdetect.LanguageDetectionEnhancementEngine"^^<http://www.w3.org/2001/XMLSchema#string>
 ;
      <http://purl.org/dc/terms/language>
              "en" ;
      <http://purl.org/dc/terms/type>
              <http://purl.org/dc/terms/LinguisticSystem> .

<urn:enhancement-571f3031-badc-6d5a-753f-376c5b334c95>
      a       <http://fise.iks-project.eu/ontology/TextAnnotation> , 
<http://fise.iks-project.eu/ontology/Enhancement> ;
      <http://fise.iks-project.eu/ontology/confidence>
              "1.0"^^<http://www.w3.org/2001/XMLSchema#float> ;
      <http://fise.iks-project.eu/ontology/extracted-from>
              <urn:content-item-sha1-7c31e64955afb9f4a09e72075ac48125de156c94> ;
      <http://purl.org/dc/terms/created>
              
"2015-04-16T06:45:52.288Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
      <http://purl.org/dc/terms/creator>
              "Content-Language Header of the 
request"^^<http://www.w3.org/2001/XMLSchema#string> ;
      <http://purl.org/dc/terms/language>
              "de" ;
      <http://purl.org/dc/terms/type>
              <http://purl.org/dc/terms/LinguisticSystem> .
{code}

Now the response contains two Language Annotations. The one also present in the 
first request (from the language detection engine) for English and a second one 
for German. 

The Language Annotation for German uses "Content-Language Header of the 
request" as dc:creator and a confidence of 1.0

> Create Language Annotation for parsed "Content-Language" header
> ---------------------------------------------------------------
>
>                 Key: STANBOL-1417
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1417
>             Project: Stanbol
>          Issue Type: Improvement
>          Components: Enhancement Engines
>    Affects Versions: 0.12.0
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>             Fix For: 1.0.0, 0.12.1
>
>
> Stanbol supports parsing the language of the content by using the 
> "Content-Language" header since STANBOL-660. However currently only the 
> `dc:language` property is set for the ContentItem.
> However based on the specification of STANBOL-613 this information is only 
> used as fallback if no language annotation is present in the ContentItem. So 
> as soon as any Language Identification Engine is present in the Chain the 
> "Content-Language" as parsed by the User will get ignored. This is not the 
> intention of a user explicitly parsing the language.
> To force Stanbol to use the parsed language a Language Annotation with the 
> confidence 1.0 needs to be added to the metadata of the ContentItem instead. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to