Bugs item #2968881, was opened at 2010-03-11 12:01
Message generated for change (Comment added) made by heringcheng
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2968881&group_id=56967
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: SQL/Core
Group: MonetDB5 "stable"
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Hering Cheng (heringcheng)
Assigned to: Niels Nes (nielsnes)
Summary: Unicode Support
Initial Comment:
Does MonetDB support Unicode/UTF-8 by default? I built my own 64-bit version
on Solaris SPARC but am unable to load UTF data into a VARCHAR field. What am
I missing? Here is the output from mclient:
MAPI = chen...@myserver:50000
QUERY = DELETE FROM mytable; COPY 132590 RECORDS INTO mytable FROM
'mydata.raw.txt.tmp' DELIMITERS '|', '\n', '"';
ERROR = !SQLException:sql:value 'ESAVINGSSTORE.COM,â–’INC.' from line 7797 field
5 not inserted, expecting type str
!SQLException:importTable:failed to import table
I am also attaching the record containing the offending data.
----------------------------------------------------------------------
Comment By: Hering Cheng (heringcheng)
Date: 2010-03-11 12:33
Message:
Thank you for the response.
While I understand that my data is not actually UTF-8, I was still
wondering if there is anything special I need to do to enable UTF-8.
This non-UTF-8 data comes from a Sybase IQ database with VARCHAR as the
data type. This must have been some garbage that somehow Sybase accepted.
Is there a recommended way for me to cleanse my text file (i.e., to rid it
of non-UTF data) before I load it into MonetDB?
----------------------------------------------------------------------
Comment By: Sjoerd Mullender (sjoerd)
Date: 2010-03-11 12:23
Message:
MonetDB does support UTF-8 fully. However, your data isn't UTF-8. At
least when I download it from SF, I see that between the comma and the word
INC there is a Latin-1 no-break space (\240). That is not UTF-8, so
MonetDB complains correctly about that.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2968881&group_id=56967
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs