Bugs item #2968881, was opened at 2010-03-11 21:01
Message generated for change (Comment added) made by sjoerd
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2968881&group_id=56967

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: SQL/Core
Group: MonetDB5 "stable"
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Hering Cheng (heringcheng)
Assigned to: Niels Nes (nielsnes)
Summary: Unicode Support

Initial Comment:
Does MonetDB support Unicode/UTF-8 by default?  I built my own 64-bit version 
on Solaris SPARC but am unable to load UTF data into a VARCHAR field.  What am 
I missing?  Here is the output from mclient:

MAPI  = chen...@myserver:50000
QUERY = DELETE FROM mytable; COPY 132590 RECORDS INTO mytable FROM 
'mydata.raw.txt.tmp' DELIMITERS '|', '\n', '"';
ERROR = !SQLException:sql:value 'ESAVINGSSTORE.COM,â–’INC.' from line 7797 field 
5 not inserted, expecting type str
        !SQLException:importTable:failed to import table

I am also attaching the record containing the offending data.


----------------------------------------------------------------------

>Comment By: Sjoerd Mullender (sjoerd)
Date: 2010-03-11 21:41

Message:
I suppose you could use a tool such as iconv to convert your data from just
about any encoding to UTF-8, but you do need to know the encoding.
If you want to just convert no-break spaces and other weird characters to
something more "normal", I suppose you could use sed.
Mclient can also convert data, but when you use the COPY INTO command and
you specify an actual file name (as opposed to STDIN), then the server
reads the data directly, and it does not do conversions.

----------------------------------------------------------------------

Comment By: Hering Cheng (heringcheng)
Date: 2010-03-11 21:33

Message:
Thank you for the response.

While I understand that my data is not actually UTF-8, I was still
wondering if there is anything special I need to do to enable UTF-8.

This non-UTF-8 data comes from a Sybase IQ database with VARCHAR as the
data type.  This must have been some garbage that somehow Sybase accepted. 
Is there a recommended way for me to cleanse my text file (i.e., to rid it
of non-UTF data) before I load it into MonetDB?

----------------------------------------------------------------------

Comment By: Sjoerd Mullender (sjoerd)
Date: 2010-03-11 21:23

Message:
MonetDB does support UTF-8 fully.  However, your data isn't UTF-8.  At
least when I download it from SF, I see that between the comma and the word
INC there is a Latin-1 no-break space (\240).  That is not UTF-8, so
MonetDB complains correctly about that.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2968881&group_id=56967

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs

Reply via email to