Hi, Stanbol uses the Apache Jena Parsers (via Clerezza) for parsing. If you have non ASCII characters I recommend to store the file as UTF-8 and process it telling Stanbol that it is Turtle formatted. N-Triples is a sub-set of Turtle so any N-Triples file is also a valid Turtle file. However Turtle does support charsets. At least this is the trick I use when loading RDF to a Sesame based triple store. With Stanbol (Apache Jena based) I never had a problem like that.
best Rupert On Thu, May 28, 2015 at 7:21 PM, Umutcan Şimşek <umutcan.sim...@mni.thm.de> wrote: > Hello All, > > According to N-Triples standart [1], it's not allowed to use Extended ASCII > characters in literals. (refer EBNF)Therefore, when I extract triples from > CMS database, I cannot represent characters like ö ü ä properly. (I replace > it with a bytecode ) > > Can stanbol process these characters? If I configure NLP modules for German, > is it going to be able to recognize, for instance, the word "Jäger"? > > [1] http://www.w3.org/2001/sw/RDFCore/ntriples > > Best Regards > > Umutcan > -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen | REDLINK.CO .......................................................................... | http://redlink.co/