That mangled string is the result of reading UTF-8 byte sequences as 
single-byte characters, e.g. ASCII or some Windows code page.

 

How are you loading it into BaseX? It seems unlikely that BaseX-provided code 
would make this kind of basic mistake in reading text but it’s possible it 
applied the incorrect encoding for some reason.

 

Cheers,

 

Eliot

 

--

Eliot Kimber

http://contrext.com

 

 

 

From: <basex-talk-boun...@mailman.uni-konstanz.de> on behalf of BitRider001 
<bit.rider....@pm.me>
Reply-To: BitRider001 <bit.rider....@pm.me>
Date: Thursday, May 17, 2018 at 8:34 PM
To: Bridger Dyson-Smith <bdysonsm...@gmail.com>
Cc: "basex-talk@mailman.uni-konstanz.de" <basex-talk@mailman.uni-konstanz.de>
Subject: Re: [basex-talk] about special characters

 

Bridger,

 

Indeed the file was exported from Excel in UTF-8 encoding. I've tried opening 
the CSV file using Notepad/Wordpad and in Linux with vi in a terminal and in 
both situations it displays the correct special character.

 

Its only when I load it into a BaseX db and query it does it show itself, as 
you said, as "mangled". Saving the results into a text file also contains the 
"mangled" string.

 

Strange.

 

Bit

 

 

 

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On May 18, 2018 9:21 AM, Bridger Dyson-Smith <bdysonsm...@gmail.com> wrote:

 

Bit - 

that's odd; it looks like the characters are being decomposed (or whatever the 
term is) and mangled but I'm not sure, unfortunately. Was the CSV an export 
from Excel? If so, I suppose this could be a Windows character set problem 
(cp-1252 or iso-8859-1 or something?).

 

Bridger

 

On Thu, May 17, 2018 at 9:11 PM BitRider001 <bit.rider....@pm.me> wrote:

Hi Bridger,

 

Yes that is right. I'm on the latest (9.0.1). Attaching a screenshot here for 
anyone to take a look.

 

 

Bit

 

 

 

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On May 18, 2018 8:41 AM, Bridger Dyson-Smith <bdysonsm...@gmail.com> wrote:

 

Hi Bit - are you using the latest version? There was a problem with 9.0 and 
some Unicode characters. Christian and co. have a fix in v9.0.1.

 

HTH,

Bridger

 

On Thu, May 17, 2018, 7:54 PM BitRider001 <bit.rider....@pm.me> wrote:

Hi,

 

I just joined the mailing list due to a problem I'm having displaying and 
storing special characters.

 

I started with a CSV and created a database from it and the CSV is in UTF-8. 
However, when I query the special characters become garbled. I'm using the GUI 
in Windows 10.

 

It starts with this in the CSV:

<name>Cañelas</name>

 

Then ends up with this when I export the query result into a text file:

<name>Ca�las</name>

 

 

Help please.

 

Bit

 

 

 

 

 

 

 

 

Reply via email to