On Wednesday 27 October 2004 08:35, Zabach, Elke wrote: > Hermann Himmelbauer wrote: > > Hi, > > I need to store multibyte characters into my database (MaxDB). To my > > mind, there are two approaches: > > > > 1) Store everything in UTF-8 format in normal varchars. I don't know if > > MaxDB > > will have problems with this, probably there will be errors such as that > > the > > string length can be faulty. > > 2) Store everything in the 2-byte Unicode format - but this will nearly > > double > > the size of my database, moreover I don't know if I will have to convert > > every string from/to UTF-8 to/from Unicode. > > > > What's the appropriate solution? > > It is (as always) depending > - on the number of characters not being ASCII-7-Bit, but being stored with > 3 or more bytes in UTF8. You said, that storing in UCS2 will nearly double, > so we can assume, ASCII-7-bit is the main thing and only few character will > need more than one byte even in UTF8 - on the functions / qualifications > you need. > With UTF8 stored in (VAR)CHAR (n) BYTE you will not be able to handle > LENGTH, SUBSTR, (x)FILL, (x)TRIM, and so on or LIKE correctly. Sorting > could do, perhaps even in a way you can live with. - on the knowledge about > the data you have. If you want to store at maximum n character (no matter > if ASCII or UNICODE), you will have to specify (VAR)CHAR (n). But if you > want to store n UTF8-character in a (VAR)CHAR(x)BYTE-column, which value is > needed for x (not to be too small if several character need more than just > one byte)? BTW: the database then will not be able to check if more than n > character are stored in this value. - (VAR)CHAR(n) BYTE cannot be compared > to (VAR)CHAR(n) ASCII/UNICODE, but on the other hand: to store UTF8 in > (VAR)CHAR(n) ASCII is not the recommended way of using, although I do not > know any knockout-criteria for this. - For storing as UTF8 no > _UNICODE=YES-instance is needed, meaning that no unicode-storage of the > catalog (info about tables, columns, views, users,...) is done. - If a > record needs for example half the space in UTF8 than it does in UCS2, > double the number of records can be stored in one page, meaning less pages > are needed for the same number of records, meaning less I/O / smaller data > cache needed for all records if your search strategies are so bad to need a > scan. > > ==> if you are sure that you now and ever will only need a stupid container > for your data and all handling (functions and so on, LIKE) will be done in > the application, then maybe (although storage is not too expensive) storing > as UTF8 is a good decision. If you really know this...
Many thanks for your quick answer! So it seems best to store my strings in the Unicode (UCS-2) format. Storage space is not a big issue. To explain my situation, I already have a MaxDB database running in ASCII mode. Characters are now encoded using the latin-1 character set. As our application will become international, we have to switch to Unicode. At first I need to convert the data from latin-1 to unicode - can this be done with MaxDB (e.g. with the loader utility) or do I have to do the conversion from 1byte ASCII to 2byte UCS by hand (e.g. python script)? Moreover I access the database via ODBC (Windows and Linux) and via the python interface. Is unicode supported on these call interfaces? I read somewhere that there are problems with Unicode on Linux ODBC - is this true? Moreover I read that it's possible to read/write Unicode strings with python but it's impossible to use unicode strings for query strings - is this still true? I need simple select statements like "select * from tableA where columnX like '�%'" and this should be possible also from python. What about sorting tables in unicode? Will e.g. "�" be interpreted as "OE" and be sorted accordingly? Best Regards, Hermann -- [EMAIL PROTECTED] GPG key ID: 299893C7 (on keyservers) FP: 0124 2584 8809 EF2A DBF9 4902 64B4 D16B 2998 93C7 -- MaxDB Discussion Mailing List For list archives: http://lists.mysql.com/maxdb To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]
