Hi Bit, The problem may have to do with the character encoding. Try providing the “encoding” option, e.g.
csv:parse($file, map{ "encoding": "windows-1252" }) I’d also like to call your attention to this module which provides a way to read Excel files directly from XQuery without the intermediary step of saving to CSV. https://github.com/eliudmeza/OOXML-Library-XQuery-BaseXdb I hope this is of some help. Vincent From: basex-talk-boun...@mailman.uni-konstanz.de [mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of BitRider001 Sent: Thursday, May 17, 2018 10:56 PM To: Eliot Kimber <ekim...@contrext.com> Cc: BaseX <basex-talk@mailman.uni-konstanz.de> Subject: Re: [basex-talk] about special characters Update: I found a way to export the Excel sheet into XML then created a new database and pointed to the XML file. This returned the results with the correct special characters. My guess is it may have something to do with the CSV Parser. Thanks, BIt ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On May 18, 2018 10:11 AM, BitRider001 <bit.rider....@pm.me<mailto:bit.rider....@pm.me>> wrote: Hi Eliot, I loaded it by first creating a new database and pointing to the CSV file as input. The default encoding as far as I can tell is UTF-8 as shown in the attached screenshot. The CSV file was exported from Excel in UTF-8 encoding. Perplexed, Bit ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On May 18, 2018 9:53 AM, Eliot Kimber <ekim...@contrext.com<mailto:ekim...@contrext.com>> wrote: That mangled string is the result of reading UTF-8 byte sequences as single-byte characters, e.g. ASCII or some Windows code page. How are you loading it into BaseX? It seems unlikely that BaseX-provided code would make this kind of basic mistake in reading text but it’s possible it applied the incorrect encoding for some reason. Cheers, Eliot -- Eliot Kimber http://contrext.com<http://contrext.com> From: <basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>> on behalf of BitRider001 <bit.rider....@pm.me<mailto:bit.rider....@pm.me>> Reply-To: BitRider001 <bit.rider....@pm.me<mailto:bit.rider....@pm.me>> Date: Thursday, May 17, 2018 at 8:34 PM To: Bridger Dyson-Smith <bdysonsm...@gmail.com<mailto:bdysonsm...@gmail.com>> Cc: "basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de>" <basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de>> Subject: Re: [basex-talk] about special characters Bridger, Indeed the file was exported from Excel in UTF-8 encoding. I've tried opening the CSV file using Notepad/Wordpad and in Linux with vi in a terminal and in both situations it displays the correct special character. Its only when I load it into a BaseX db and query it does it show itself, as you said, as "mangled". Saving the results into a text file also contains the "mangled" string. Strange. Bit ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On May 18, 2018 9:21 AM, Bridger Dyson-Smith <bdysonsm...@gmail.com<mailto:bdysonsm...@gmail.com>> wrote: Bit - that's odd; it looks like the characters are being decomposed (or whatever the term is) and mangled but I'm not sure, unfortunately. Was the CSV an export from Excel? If so, I suppose this could be a Windows character set problem (cp-1252 or iso-8859-1 or something?). Bridger On Thu, May 17, 2018 at 9:11 PM BitRider001 <bit.rider....@pm.me<mailto:bit.rider....@pm.me>> wrote: Hi Bridger, Yes that is right. I'm on the latest (9.0.1). Attaching a screenshot here for anyone to take a look. Bit ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On May 18, 2018 8:41 AM, Bridger Dyson-Smith <bdysonsm...@gmail.com<mailto:bdysonsm...@gmail.com>> wrote: Hi Bit - are you using the latest version? There was a problem with 9.0 and some Unicode characters. Christian and co. have a fix in v9.0.1. HTH, Bridger On Thu, May 17, 2018, 7:54 PM BitRider001 <bit.rider....@pm.me<mailto:bit.rider....@pm.me>> wrote: Hi, I just joined the mailing list due to a problem I'm having displaying and storing special characters. I started with a CSV and created a database from it and the CSV is in UTF-8. However, when I query the special characters become garbled. I'm using the GUI in Windows 10. It starts with this in the CSV: <name>Cañelas</name> Then ends up with this when I export the query result into a text file: <name>Ca�las</name> Help please. Bit