Re: stanbol enhancer/entityhub and german characters

Rupert Westenthaler Thu, 28 May 2015 21:10:49 -0700

Hi,

Stanbol uses the Apache Jena Parsers (via Clerezza) for parsing. If
you have non ASCII characters I recommend to store the file as UTF-8
and process it telling Stanbol that it is Turtle formatted. N-Triples
is a sub-set of Turtle so any N-Triples file is also a valid Turtle
file. However Turtle does support charsets. At least this is the trick
I use when loading RDF to a Sesame based triple store. With Stanbol
(Apache Jena based) I never had a problem like that.


best
Rupert

On Thu, May 28, 2015 at 7:21 PM, Umutcan Şimşek
<[email protected]> wrote:
> Hello All,
>
> According to N-Triples standart [1], it's not allowed to use Extended ASCII
> characters in literals. (refer EBNF)Therefore, when I extract triples from
> CMS database, I cannot represent characters like ö ü ä properly. (I replace
> it with a bytecode )
>
> Can stanbol process these characters? If I configure NLP modules for German,
> is it going to be able to recognize, for instance, the word "Jäger"?
>
> [1] http://www.w3.org/2001/sw/RDFCore/ntriples
>
> Best Regards
>
> Umutcan
>



-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO 
..........................................................................
| http://redlink.co/

Re: stanbol enhancer/entityhub and german characters

Reply via email to