On Wednesday 27 October 2004 08:35, Zabach, Elke wrote:
> Hermann Himmelbauer wrote:
> > Hi,
> > I need to store multibyte characters into my database (MaxDB). To my
> > mind, there are two approaches:
> >
> > 1) Store everything in UTF-8 format in normal varchars. I don't know if
> > MaxDB
> > will have problems with this, probably there will be errors such as that
> > the
> > string length can be faulty.
> > 2) Store everything in the 2-byte Unicode format - but this will nearly
> > double
> > the size of my database, moreover I don't know if I will have to convert
> > every string from/to UTF-8 to/from Unicode.
> >
> > What's the appropriate solution?
>
> It is (as always) depending
> - on the number of characters not being ASCII-7-Bit, but being stored with
> 3 or more bytes in UTF8. You said, that storing in UCS2 will nearly double,
> so we can assume, ASCII-7-bit is the main thing and only few character will
> need more than one byte even in UTF8 - on the functions / qualifications
> you need.
> With UTF8 stored in (VAR)CHAR (n) BYTE you will not be able to handle
> LENGTH, SUBSTR, (x)FILL, (x)TRIM, and so on or LIKE correctly. Sorting
> could do, perhaps even in a way you can live with. - on the knowledge about
> the data you have. If you want to store at maximum n character (no matter
> if ASCII or UNICODE), you will have to specify (VAR)CHAR (n). But if you
> want to store n UTF8-character in a (VAR)CHAR(x)BYTE-column, which value is
> needed for x (not to be too small if several character need more than just
> one byte)? BTW: the database then will not be able to check if more than n
> character are stored in this value. - (VAR)CHAR(n) BYTE cannot be compared
> to (VAR)CHAR(n) ASCII/UNICODE, but on the other hand: to store UTF8 in
> (VAR)CHAR(n) ASCII is not the recommended way of using, although I do not
> know any knockout-criteria for this. - For storing as UTF8 no
> _UNICODE=YES-instance is needed, meaning that no unicode-storage of the
> catalog (info about tables, columns, views, users,...) is done. - If a
> record needs for example half the space in UTF8 than it does in UCS2,
> double the number of records can be stored in one page, meaning less pages
> are needed for the same number of records, meaning less I/O / smaller data
> cache needed for all records if your search strategies are so bad to need a
> scan.
>
> ==> if you are sure that you now and ever will only need a stupid container
> for your data and all handling (functions and so on, LIKE) will be done in
> the application, then maybe (although storage is not too expensive) storing
> as UTF8 is a good decision. If you really know this...

Many thanks for your quick answer!

So it seems best to store my strings in the Unicode (UCS-2) format. Storage 
space is not a big issue.

To explain my situation, I already have a MaxDB database running in ASCII 
mode. Characters are now encoded using the latin-1 character set. As our 
application will become international, we have to switch to Unicode.

At first I need to convert the data from latin-1 to unicode - can this be done 
with MaxDB (e.g. with the loader utility) or do I have to do the conversion 
from 1byte ASCII to 2byte UCS by hand (e.g. python script)?

Moreover I access the database via ODBC (Windows and Linux) and via the python 
interface. Is unicode supported on these call interfaces? I read somewhere 
that there are problems with Unicode on Linux ODBC - is this true? Moreover I 
read that it's possible to read/write Unicode strings with python but it's 
impossible to use unicode strings for query strings - is this still true? I 
need simple select statements like "select * from tableA where columnX like 
'�%'" and this should be possible also from python.

What about sorting tables in unicode? Will e.g. "�" be interpreted as "OE" and 
be sorted accordingly?

Best Regards,
Hermann

-- 
[EMAIL PROTECTED]
GPG key ID: 299893C7 (on keyservers)
FP: 0124 2584 8809 EF2A DBF9  4902 64B4 D16B 2998 93C7


--
MaxDB Discussion Mailing List
For list archives: http://lists.mysql.com/maxdb
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to