Hi Bit,

The problem may have to do with the character encoding. Try providing the 
“encoding” option, e.g.

csv:parse($file, map{ "encoding": "windows-1252" })

I’d also like to call your attention to this module which provides a way to 
read Excel files directly from XQuery without the intermediary step of saving 
to CSV.

https://github.com/eliudmeza/OOXML-Library-XQuery-BaseXdb

I hope this is of some help.

Vincent


From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of BitRider001
Sent: Thursday, May 17, 2018 10:56 PM
To: Eliot Kimber <ekim...@contrext.com>
Cc: BaseX <basex-talk@mailman.uni-konstanz.de>
Subject: Re: [basex-talk] about special characters

Update:

I found a way to export the Excel sheet into XML then created a new database 
and pointed to the XML file. This returned the results with the correct special 
characters.

My guess is it may have something to do with the CSV Parser.

Thanks,
BIt



‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On May 18, 2018 10:11 AM, BitRider001 
<bit.rider....@pm.me<mailto:bit.rider....@pm.me>> wrote:

Hi Eliot,

I loaded it by first creating a new database and pointing to the CSV file as 
input. The default encoding as far as I can tell is UTF-8 as shown in the 
attached screenshot. The CSV file was exported from Excel in UTF-8 encoding.

Perplexed,
Bit



‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On May 18, 2018 9:53 AM, Eliot Kimber 
<ekim...@contrext.com<mailto:ekim...@contrext.com>> wrote:

That mangled string is the result of reading UTF-8 byte sequences as 
single-byte characters, e.g. ASCII or some Windows code page.

How are you loading it into BaseX? It seems unlikely that BaseX-provided code 
would make this kind of basic mistake in reading text but it’s possible it 
applied the incorrect encoding for some reason.

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com<http://contrext.com>



From: 
<basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>>
 on behalf of BitRider001 <bit.rider....@pm.me<mailto:bit.rider....@pm.me>>
Reply-To: BitRider001 <bit.rider....@pm.me<mailto:bit.rider....@pm.me>>
Date: Thursday, May 17, 2018 at 8:34 PM
To: Bridger Dyson-Smith <bdysonsm...@gmail.com<mailto:bdysonsm...@gmail.com>>
Cc: 
"basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de>" 
<basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de>>
Subject: Re: [basex-talk] about special characters

Bridger,

Indeed the file was exported from Excel in UTF-8 encoding. I've tried opening 
the CSV file using Notepad/Wordpad and in Linux with vi in a terminal and in 
both situations it displays the correct special character.

Its only when I load it into a BaseX db and query it does it show itself, as 
you said, as "mangled". Saving the results into a text file also contains the 
"mangled" string.

Strange.

Bit



‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On May 18, 2018 9:21 AM, Bridger Dyson-Smith 
<bdysonsm...@gmail.com<mailto:bdysonsm...@gmail.com>> wrote:

Bit -
that's odd; it looks like the characters are being decomposed (or whatever the 
term is) and mangled but I'm not sure, unfortunately. Was the CSV an export 
from Excel? If so, I suppose this could be a Windows character set problem 
(cp-1252 or iso-8859-1 or something?).

Bridger

On Thu, May 17, 2018 at 9:11 PM BitRider001 
<bit.rider....@pm.me<mailto:bit.rider....@pm.me>> wrote:
Hi Bridger,

Yes that is right. I'm on the latest (9.0.1). Attaching a screenshot here for 
anyone to take a look.


Bit



‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On May 18, 2018 8:41 AM, Bridger Dyson-Smith 
<bdysonsm...@gmail.com<mailto:bdysonsm...@gmail.com>> wrote:

Hi Bit - are you using the latest version? There was a problem with 9.0 and 
some Unicode characters. Christian and co. have a fix in v9.0.1.

HTH,
Bridger

On Thu, May 17, 2018, 7:54 PM BitRider001 
<bit.rider....@pm.me<mailto:bit.rider....@pm.me>> wrote:
Hi,

I just joined the mailing list due to a problem I'm having displaying and 
storing special characters.

I started with a CSV and created a database from it and the CSV is in UTF-8. 
However, when I query the special characters become garbled. I'm using the GUI 
in Windows 10.

It starts with this in the CSV:
<name>Cañelas</name>

Then ends up with this when I export the query result into a text file:
<name>Ca�las</name>


Help please.

Bit










Reply via email to