Update:

I found a way to export the Excel sheet into XML then created a new database 
and pointed to the XML file. This returned the results with the correct special 
characters.

My guess is it may have something to do with the CSV Parser.

Thanks,
BIt

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On May 18, 2018 10:11 AM, BitRider001 <bit.rider....@pm.me> wrote:

> Hi Eliot,
>
> I loaded it by first creating a new database and pointing to the CSV file as 
> input. The default encoding as far as I can tell is UTF-8 as shown in the 
> attached screenshot. The CSV file was exported from Excel in UTF-8 encoding.
>
> Perplexed,
> Bit
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On May 18, 2018 9:53 AM, Eliot Kimber <ekim...@contrext.com> wrote:
>
>> That mangled string is the result of reading UTF-8 byte sequences as 
>> single-byte characters, e.g. ASCII or some Windows code page.
>>
>> How are you loading it into BaseX? It seems unlikely that BaseX-provided 
>> code would make this kind of basic mistake in reading text but it’s possible 
>> it applied the incorrect encoding for some reason.
>>
>> Cheers,
>>
>> Eliot
>>
>> --
>>
>> Eliot Kimber
>>
>> http://contrext.com
>>
>> From: <basex-talk-boun...@mailman.uni-konstanz.de> on behalf of BitRider001 
>> <bit.rider....@pm.me>
>> Reply-To: BitRider001 <bit.rider....@pm.me>
>> Date: Thursday, May 17, 2018 at 8:34 PM
>> To: Bridger Dyson-Smith <bdysonsm...@gmail.com>
>> Cc: "basex-talk@mailman.uni-konstanz.de" <basex-talk@mailman.uni-konstanz.de>
>> Subject: Re: [basex-talk] about special characters
>>
>> Bridger,
>>
>> Indeed the file was exported from Excel in UTF-8 encoding. I've tried 
>> opening the CSV file using Notepad/Wordpad and in Linux with vi in a 
>> terminal and in both situations it displays the correct special character.
>>
>> Its only when I load it into a BaseX db and query it does it show itself, as 
>> you said, as "mangled". Saving the results into a text file also contains 
>> the "mangled" string.
>>
>> Strange.
>>
>> Bit
>>
>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>
>> On May 18, 2018 9:21 AM, Bridger Dyson-Smith <bdysonsm...@gmail.com> wrote:
>>
>>> Bit -
>>>
>>> that's odd; it looks like the characters are being decomposed (or whatever 
>>> the term is) and mangled but I'm not sure, unfortunately. Was the CSV an 
>>> export from Excel? If so, I suppose this could be a Windows character set 
>>> problem (cp-1252 or iso-8859-1 or something?).
>>>
>>> Bridger
>>>
>>> On Thu, May 17, 2018 at 9:11 PM BitRider001 <bit.rider....@pm.me> wrote:
>>>
>>>> Hi Bridger,
>>>>
>>>> Yes that is right. I'm on the latest (9.0.1). Attaching a screenshot here 
>>>> for anyone to take a look.
>>>>
>>>> Bit
>>>>
>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>
>>>> On May 18, 2018 8:41 AM, Bridger Dyson-Smith <bdysonsm...@gmail.com> wrote:
>>>>
>>>>> Hi Bit - are you using the latest version? There was a problem with 9.0 
>>>>> and some Unicode characters. Christian and co. have a fix in v9.0.1.
>>>>>
>>>>> HTH,
>>>>>
>>>>> Bridger
>>>>>
>>>>> On Thu, May 17, 2018, 7:54 PM BitRider001 <bit.rider....@pm.me> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I just joined the mailing list due to a problem I'm having displaying 
>>>>>> and storing special characters.
>>>>>>
>>>>>> I started with a CSV and created a database from it and the CSV is in 
>>>>>> UTF-8. However, when I query the special characters become garbled. I'm 
>>>>>> using the GUI in Windows 10.
>>>>>>
>>>>>> It starts with this in the CSV:
>>>>>>
>>>>>> <name>Cañelas</name>
>>>>>>
>>>>>> Then ends up with this when I export the query result into a text file:
>>>>>>
>>>>>> <name>Ca�las</name>
>>>>>>
>>>>>> Help please.
>>>>>>
>>>>>> Bit

Reply via email to