Hi all!

I would like to put a note here for other people who are planning to use UTF-8 
charsets and ci/ai collations.

The tests i realized before (in a previous message) didn't show the real 
numbers.

After doing more tests, i realized that searchs using LIKE, STARTING WITH, 
CONTAINING, etc... have an HUGE performance penalty when using UTF8 and CI/AI 
collation.

Here are the numbers, based on a table with 5.500 records and searching in a 
utf8 ci/ai field

select * from pl1_cidades c where c.nome = 'Americana'
78ms

select * from pl1_cidades c where c.nome like 'Americana'
12s75ms (!!!)

select * from pl1_cidades c where c.nome containing 'Americana'
6s208ms (!!!)

And here are the numbers, based on a table with 5.500 records and searching in 
a iso8851-1 field:

select * from pl1_cidadesa c where c.nome = 'Americana'
78ms

select * from pl1_cidadesa c where c.nome like 'Americana'
98ms

select * from pl1_cidadesa c where c.nome containing 'Americana'
98ms

So, unless there are some plans to improve UTF8 searchs a lot in next firebird 
versions, for now i think it's a risk to use this charset if your project will 
need to use SQL instructions like above.

Regards,

Fabiano

--- In [email protected], "Fabiano" <fabiano@...> wrote:
>
> Hi!
> 
> I tried that (actually inserted 50.000 records each table), and here follows 
> the test results:
> 
> recreate table test_utf8 (
>   column1 varchar(100) character set UTF8 collate unicode_ci_ai);
> 
> recreate table test_iso8859_1 (
>   column1 varchar(100) character set ISO8859_1 collate pt_br);
> 
> -- 1m5s146
> set term !! ;
> execute block
> as
>   declare variable i integer;
> begin
>   i = 0;
>   while (i < 50000) do
>   begin
>     insert into test_utf8 values (uuid_to_char(gen_uuid()));
>     i = i + 1;
>   end
>   insert into test_utf8 values ('xxx');
> end !!
> set term ; !!
> 
> -- 1m4s959
> set term !! ;
> execute block
> as
>   declare variable i integer;
> begin
>   i = 0;
>   while (i < 50000) do
>   begin
>     insert into test_iso8859_1 values (uuid_to_char(gen_uuid()));
>     i = i + 1;
>   end
>   insert into test_iso8859_1 values ('xxx');
> end !!
> set term ; !!
> 
> -- 0s468
> select * from test_utf8 where column1 = 'xxx';
> 
> -- 0s561
> select * from test_iso8859_1 where column1 = 'xxx';
> 
> -- 1s716, 1s653, 1s841
> select first 1 * from test_utf8 order by column1;
> 
> -- 0s671, 0s717, 0s748
> select first 1 * from test_iso8859_1 order by column1;
> 
> So, the main performance difference here was ordering the data (2X-3X 
> slower). Other operations seems to have the same performance (althought i 
> didn“t use special characters in the records).
> 
> Regards,
> 
> Fabiano
> 
> --- In [email protected], "Fabiano" <fabianoaspro@> wrote:
> >
> > Firebird is extreme slow with UTF-8 character set.
> > 
> > Try it:
> > Create the same table with/without UTF-8 encoding, put 5000 records with no
> > indexes then do:
> > 
> > 'select * from table order by fieldA
> > 
> > select * from table where FieldA='xxxx'
> > 
> >  
> > 
> > Cheers
> > 
> >  
> > 
> > 
> > 
> > [Non-text portions of this message have been removed]
> >
>

Reply via email to