Bugs item #2968881, was opened at 2010-03-11 12:01
Message generated for change (Comment added) made by heringcheng
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2968881&group_id=56967

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: SQL/Core
Group: MonetDB5 "stable"
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Hering Cheng (heringcheng)
Assigned to: Niels Nes (nielsnes)
Summary: Unicode Support

Initial Comment:
Does MonetDB support Unicode/UTF-8 by default?  I built my own 64-bit version 
on Solaris SPARC but am unable to load UTF data into a VARCHAR field.  What am 
I missing?  Here is the output from mclient:

MAPI  = chen...@myserver:50000
QUERY = DELETE FROM mytable; COPY 132590 RECORDS INTO mytable FROM 
'mydata.raw.txt.tmp' DELIMITERS '|', '\n', '"';
ERROR = !SQLException:sql:value 'ESAVINGSSTORE.COM,â–’INC.' from line 7797 field 
5 not inserted, expecting type str
        !SQLException:importTable:failed to import table

I am also attaching the record containing the offending data.


----------------------------------------------------------------------

Comment By: Hering Cheng (heringcheng)
Date: 2010-03-11 12:33

Message:
Thank you for the response.

While I understand that my data is not actually UTF-8, I was still
wondering if there is anything special I need to do to enable UTF-8.

This non-UTF-8 data comes from a Sybase IQ database with VARCHAR as the
data type.  This must have been some garbage that somehow Sybase accepted. 
Is there a recommended way for me to cleanse my text file (i.e., to rid it
of non-UTF data) before I load it into MonetDB?

----------------------------------------------------------------------

Comment By: Sjoerd Mullender (sjoerd)
Date: 2010-03-11 12:23

Message:
MonetDB does support UTF-8 fully.  However, your data isn't UTF-8.  At
least when I download it from SF, I see that between the comma and the word
INC there is a Latin-1 no-break space (\240).  That is not UTF-8, so
MonetDB complains correctly about that.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2968881&group_id=56967

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs

Reply via email to