Ok I understand, but this doesn't resolve the initialization problem, I'm sure there is a bug, or
something which can be improved a lot. I can't use the UNIQUE constraint when I have to much key
which doesn't fit in RAM.
Because even If have memcache/bloom filter in front, I still need to create my initial dataset with
the unique constraint,and even if my initial dataset is unique, I can't insert them in my table
because of the bug explained before.
If mysql can do a sort so quickly (create index), it should be trivial to reuse this sorted data,
eliminate the duplicate and create the UNIQUE index.
Le 12/02/2014 09:24, Justin Swanhart a écrit :
> Hi,
>
> This is not a bug, but how b tree indexes work. For them to be efficient
they must fit in ram.
> There are buffering mechanisms that can be used for secondary indexes in
some cases, because the
> write can be done without a read, but ONLY when the index is not unique. It
if it unique, then the
> index dive is necessary and a btree traversal can take multiple random IO to
get to the leaf node.
>
> Faster IO can help, so you can look into flash storage.
>
> Consider fronting your lookups with memcache (a hash index) or a bloom filter
(there are many
> implementations on the net) to reduce lookups.
>
> --Justin
Le 12/02/2014 09:04, Pierre a écrit :
Using this technique I have the same Issue. It's now running for severals
hours, I'm at a 40% and
looking at show full processlist, it's getting slower and slower. It will never
finish.
I think there is a bug here.
Firstly, regardly the memory usage, It doesn't correctly use the buffer I did
set, I think it's only
using the key_buffer_size. myisam_sort_buffer_size or bulk_insert_buffer_size
are not used on this
task.
So what's happening ? When the RAM is full, mariadb is doing random access on
the disk to sort and
filter Go of data ! That's why my cpu was only used a few percent, the process
was in i/o wait most
of the time.
So what I am saying here is : mariadb can't crate UNIQUE keys/index if the rows
doesn't fit in RAM.
However if I try to create a standard index (ie non unique), it works well and
it's done in less
than an hour.
ALTER IGNORE TABLE mytable ADD INDEX (c1), ADD INDEX(c2);
Query OK, 349086532 rows affected (44 min 25.21 sec)
Records: 349086532 Duplicates: 0 Warnings: 0
In this second usage case, Maria is doing a good work by using the
myisam_sort_buffer_size. I think
it's doing something like an external/merge sort, spliting the rows in part
that fit in RAM, sorting
them, merging them and creating index.
It was 100% cpu most of the time, when It was not it was because mysql was
loading the rows in RAM
from hard disk (and not doing a random access on the hard disk like in create
unique index). So why
UNIQUE index is not behaving the same way ?
It's easy to reproduce the bug, just create a binary file of 2 or 3 x size of
RAM, then load data
infile and try to create a UNIQUE index on it. It will never end.
>Am 10.02.2014 13:45, schrieb Pierre:
>> Mariadb is getting slower and slower when I'm inserting a massive amout of
data. I'm trying to
insert 166 507 066
>> rows (12go) using load data infile '' into an empty table. I splitted my
file in 13 parts of the
same size because
>> it was too long to insert in one shot. When I inserted more than 100M rows,
it starts to be
extremely slow.
>
>he reason are the permamently updated keys
>in general for large inserts on a new table
>
>UNIQUE KEY `p1` (`p1`),
>UNIQUE KEY `p2` (`p2`),
>
>* remove keyes
>* insert data
>* add kyes
>
>https://dev.mysql.com/doc/refman/5.5/en/insert-speed.html
_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to : [email protected]
Unsubscribe : https://launchpad.net/~maria-discuss
More help : https://help.launchpad.net/ListHelp
_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to : [email protected]
Unsubscribe : https://launchpad.net/~maria-discuss
More help : https://help.launchpad.net/ListHelp