I agree that for handling large text, using TEXT + UTF-8/16 + compression (I prefer LZO) is a nice approach.
In Wikipedia case, compressed(gzip --fast) UTF-8 (1117MB) is still 19% larger than compressed local encoding(940MB), but 45% smaller than uncompressed local encoding (2013MB). In Drizzle, using BLOB + local encoding + compression would be the most space efficient approach. The big problem is that we can not do FULLTEXT search for compressed values... Regards, ---- Yoshinori Matsunobu Senior MySQL Consultant Sun Microsystems MySQL Consulting Services: http://www-jp.mysql.com/consulting/ > -----Original Message----- > From: Stewart Smith [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 01, 2008 11:22 PM > To: Yoshinori Matsunobu > Cc: [EMAIL PROTECTED]; 'drizzle-discuss' > Subject: Re: [Drizzle-discuss] Toru's thoughts on UTF8 and > CJK charsets > > On Tue, Sep 30, 2008 at 03:06:40PM +0900, Yoshinori Matsunobu wrote: > > The size was 2700MB. When I converted to local encoding (EUC-JP), > > the size was 2013MB. > > What if each of the text fields was gzipped? If using --fast option, > shouldn't have much of a performance penalty (and can > possibly be a huge > win in caching). > > My theory is that compressed text fields give a bigger win than more > efficient encodings... but it's just a theory at the moment :) > > -- > Stewart Smith _______________________________________________ Mailing list: https://launchpad.net/~drizzle-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp

