I agree that for handling large text, 
using TEXT + UTF-8/16 + compression (I prefer LZO) is a nice approach. 

In Wikipedia case, compressed(gzip --fast) UTF-8 (1117MB) is still 19%
larger than compressed local encoding(940MB), but 45% smaller than 
uncompressed local encoding (2013MB). 
In Drizzle, using BLOB + local encoding + compression 
 would be the most space efficient approach. 

The big problem is that we can not do FULLTEXT search for compressed
values...

 Regards,
----
Yoshinori Matsunobu
Senior MySQL Consultant
Sun Microsystems

MySQL Consulting Services:
http://www-jp.mysql.com/consulting/

> -----Original Message-----
> From: Stewart Smith [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, October 01, 2008 11:22 PM
> To: Yoshinori Matsunobu
> Cc: [EMAIL PROTECTED]; 'drizzle-discuss'
> Subject: Re: [Drizzle-discuss] Toru's thoughts on UTF8 and 
> CJK charsets
> 
> On Tue, Sep 30, 2008 at 03:06:40PM +0900, Yoshinori Matsunobu wrote:
> > The size was 2700MB. When I converted to local encoding (EUC-JP), 
> > the size was 2013MB. 
> 
> What if each of the text fields was gzipped? If using --fast option,
> shouldn't have much of a performance penalty (and can 
> possibly be a huge
> win in caching).
> 
> My theory is that compressed text fields give a bigger win than more
> efficient encodings... but it's just a theory at the moment :)
> 
> -- 
> Stewart Smith


_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Reply via email to