That mangled string is the result of reading UTF-8 byte sequences as single-byte characters, e.g. ASCII or some Windows code page.
How are you loading it into BaseX? It seems unlikely that BaseX-provided code would make this kind of basic mistake in reading text but it’s possible it applied the incorrect encoding for some reason. Cheers, Eliot -- Eliot Kimber http://contrext.com From: <basex-talk-boun...@mailman.uni-konstanz.de> on behalf of BitRider001 <bit.rider....@pm.me> Reply-To: BitRider001 <bit.rider....@pm.me> Date: Thursday, May 17, 2018 at 8:34 PM To: Bridger Dyson-Smith <bdysonsm...@gmail.com> Cc: "email@example.com" <firstname.lastname@example.org> Subject: Re: [basex-talk] about special characters Bridger, Indeed the file was exported from Excel in UTF-8 encoding. I've tried opening the CSV file using Notepad/Wordpad and in Linux with vi in a terminal and in both situations it displays the correct special character. Its only when I load it into a BaseX db and query it does it show itself, as you said, as "mangled". Saving the results into a text file also contains the "mangled" string. Strange. Bit ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On May 18, 2018 9:21 AM, Bridger Dyson-Smith <bdysonsm...@gmail.com> wrote: Bit - that's odd; it looks like the characters are being decomposed (or whatever the term is) and mangled but I'm not sure, unfortunately. Was the CSV an export from Excel? If so, I suppose this could be a Windows character set problem (cp-1252 or iso-8859-1 or something?). Bridger On Thu, May 17, 2018 at 9:11 PM BitRider001 <bit.rider....@pm.me> wrote: Hi Bridger, Yes that is right. I'm on the latest (9.0.1). Attaching a screenshot here for anyone to take a look. Bit ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On May 18, 2018 8:41 AM, Bridger Dyson-Smith <bdysonsm...@gmail.com> wrote: Hi Bit - are you using the latest version? There was a problem with 9.0 and some Unicode characters. Christian and co. have a fix in v9.0.1. HTH, Bridger On Thu, May 17, 2018, 7:54 PM BitRider001 <bit.rider....@pm.me> wrote: Hi, I just joined the mailing list due to a problem I'm having displaying and storing special characters. I started with a CSV and created a database from it and the CSV is in UTF-8. However, when I query the special characters become garbled. I'm using the GUI in Windows 10. It starts with this in the CSV: <name>Cañelas</name> Then ends up with this when I export the query result into a text file: <name>ï£¼Caï¿½ï£¼las</name> Help please. Bit