Hi! With some frequency people ask me why UTF-8 is slower than single byte charsets.
The thing is, they have something using, for example, VARCHAR(30) CHARACTER SET WIN1252 and convert to VARCHAR(30) CHARACTER SET UTF8, test with the same data and have slower queries. Database is also increased in size and record size (based on characters) limit is decreased. But if they test VARCHAR(120) CHARACTER SET WIN1252 vs VARCHAR(30) CHARACTER SET UTF8, database size and query times are similar. But this is just a test, it's not real world scenario user wants. We have old problems, for example, record size limit is tracked here: https://github.com/FirebirdSQL/firebird/issues/1130 Like commented there, I tried to just increase the constant and it seems to just work. Then we have the RLE record compression algorithm, that "compress" bytes that is well known to be unused. We had even patches to improve the bad algorithm. I believe that is not the way to go. Let's still call it "record compression", I believe it should be more active. Instead of work based only on the record buffer and its length, it should have access to the record format. Then it can encode things in more active way, trimming out unused bytes of CHAR/VARCHAR, better encoding numbers and booleans. We may use protocol-buffers format as inspiration. And then probably we don't need any RLE compression as most of data (not unused bytes) are not so repetitive. What do you think and are there any active work in this regard? Adriano Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel