That mangled string is the result of reading UTF-8 byte sequences as 
single-byte characters, e.g. ASCII or some Windows code page.


How are you loading it into BaseX? It seems unlikely that BaseX-provided code 
would make this kind of basic mistake in reading text but it’s possible it 
applied the incorrect encoding for some reason.







Eliot Kimber




From: <> on behalf of BitRider001 
Reply-To: BitRider001 <>
Date: Thursday, May 17, 2018 at 8:34 PM
To: Bridger Dyson-Smith <>
Cc: "" <>
Subject: Re: [basex-talk] about special characters




Indeed the file was exported from Excel in UTF-8 encoding. I've tried opening 
the CSV file using Notepad/Wordpad and in Linux with vi in a terminal and in 
both situations it displays the correct special character.


Its only when I load it into a BaseX db and query it does it show itself, as 
you said, as "mangled". Saving the results into a text file also contains the 
"mangled" string.








‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On May 18, 2018 9:21 AM, Bridger Dyson-Smith <> wrote:


Bit - 

that's odd; it looks like the characters are being decomposed (or whatever the 
term is) and mangled but I'm not sure, unfortunately. Was the CSV an export 
from Excel? If so, I suppose this could be a Windows character set problem 
(cp-1252 or iso-8859-1 or something?).




On Thu, May 17, 2018 at 9:11 PM BitRider001 <> wrote:

Hi Bridger,


Yes that is right. I'm on the latest (9.0.1). Attaching a screenshot here for 
anyone to take a look.







‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On May 18, 2018 8:41 AM, Bridger Dyson-Smith <> wrote:


Hi Bit - are you using the latest version? There was a problem with 9.0 and 
some Unicode characters. Christian and co. have a fix in v9.0.1.





On Thu, May 17, 2018, 7:54 PM BitRider001 <> wrote:



I just joined the mailing list due to a problem I'm having displaying and 
storing special characters.


I started with a CSV and created a database from it and the CSV is in UTF-8. 
However, when I query the special characters become garbled. I'm using the GUI 
in Windows 10.


It starts with this in the CSV:



Then ends up with this when I export the query result into a text file:




Help please.











Reply via email to