Thank you for all your answers. They really helped me to speed up my setup.

Indeed innodb_buffer_pool_size is important in my case, as you suggested, but I 
don't really know why.

Unless direct io is used, innodb_buffer_pool_size is redundant with kernel FS 
cache. So if MySQL does not compete with another software, it would only save 
syscalls, not IO access.

In fact, I didn't even plan to configure innodb_buffer_pool_size at all. For my 
use, FS cache would have been fast enough.

So why is innodb_buffer_pool_size so important when importing so much data ? 
This is only intuition but the only explanation I see is that it would buffer 
index updates: IOW, secondary indexes would not be updated on disk immediately 
on commit (not important for Durability if this can be recovered after a crash).
If I'm right, is there any other settings to make MariaDB even more aggressive 
about this ?

Here are results for different values of innodb_buffer_pool_size:
128M (default) -> 67 hours
1G -> 25 hours
2G -> 13.4 hours
4G -> 12.1 hours
All other settings being identical. Attached files are full information for 2G 
& 4G (it looks like for 4G, I showed status before MariaDB finished background 
tasks).

I think the CPU is fast enough to process everything in less than 5 hours.
By switching from USB HDD to SSD, I should reach this goal :)

With only 1 client, I tried innodb_buffer_pool_instances=1 but that didn't 
change anything.


Le 04/26/15 21:39, jocelyn fournier a écrit :
> You should try to enable innodb_file_per_table (and recreate the table), to 
> avoid filling the ibdata file.

Unless Barracuda is used, I don't think this affects performance. I tested and 
it was not faster.h

> Try also to increase the innodb_log_buffer_size to 16M (your Innodb_log_waits 
> is increasing).

Ah, this helped.
Given a non-negligible number of transactions exceeds 16M, I even increased it 
to 128M.

>  How much memory do you have on your system ? The innodb_buffer_pool_size of 
> 1G seems to be a little bit low.

12G RAM

> Depending on your usage, you could also benefit from switching to TokuDB 
> (fast insert and much better compression)

It was not easy to test on this machine which has too old gcc (gcc >= 4.7 is 
required).
So I used another MySQL server inside a Debian chroot.

Here are results for different values of tokudb_cache_size:
1G -> 24.8 hours
2G -> 12.6 hours
4G -> 13.4 hours

> Try also to use the Barracuda file_format instead of Antelope : 
> innodb_file_format = Barracuda & innodb_file_format_max = Barracuda.

This means changing the CREATE TABLE requests in the software and I prefer to 
spend time on evaluating TokuDB.

> For such a big amount of data to import, you could also try to increase the 
> innodb_log_file_size.

I did 1 test with 1G instead of 500M (and innodb_buffer_pool_size=2G) and it 
didn't change anything.


Le 04/27/15 18:09, MARK CALLAGHAN a écrit :
> For the fsync rate, my memory is vague, but I think InnoDB respects the 
> write-ahead-log rule even with innodb_flush_method = nosync and this means 
> that before a dirty page is written back the log must be forced for any 
> changes to that page. If you run strace you can determine which files get the 
> fsync and if it is only the log then my memory is correct.

Since my first email, I didn't do any other test with unsafe settings, but 
IIRC, there were also fsync to ibdata. Isn't it visible in 'show status' ?
| Innodb_data_fsyncs                            | 334552       |
| Innodb_os_log_fsyncs                          | 94311        |

> I almost always use innodb_flush_method=O_DIRECT and then configure enough 
> background write threads.

Exceptionally for the import, I could try O_DIRECT.
During normal operation, we never use this option because with several services 
on the same machine, it's too difficult to 
find the best value of innodb_buffer_pool_size, and we prefer to let the kernel 
caches IO for all processes.

> For write amplification you are computing the rate as Innodb-write-rate / 
> InnoDB-read-rate.

No I computed the read rate of the source DB (which is not a SQL DB) vs the 
write rate of mysqld.
But you're right on the fact that reading the source DB does not read any 
secondary index.

> But I think your InnoDB-read-rate is the scans of the PK index while the 
> write rate includes IO for PK and secondary indexes. This is still write-amp, 
> but deserves an asterisk to explain that point. Write-amp from the 
> doublewrite buffer can be much cheaper than write-amp from dirty page 
> writeback where "cheaper" means fewer seeks.
> 
> If you want to make the load go faster than creating the secondary indexes 
> after the table has been loaded might help.
> 
> Another interesting question is whether to load in parallel or one table at a 
> time. Sometimes, loading one table at a time helps because you are more 
> likely to keep the working set in memory during the load. This can be very 
> important when secondary indexes are maintained during the load as you can do 
> too much random read - modify - write for secondary indexes, although the 
> InnoDB change buffer helps a bit with that. The choice here isn't all 
> parallel vs 1-at-a-time. Doing a few at a time gives some parallel benefit, 
> but also gives each table load a better chance at keeping secondary indexes 
> in memory during the load.

Good to know but that would make the import algorithm too complex.
I think we reach a point where buying more RAM is cheaper.

> It might help to increase innodb_max_dirty_pages_pct to 90 (you used 75). I 
> don't have current experience with it but you might also want to increase 
> innodb_max_dirty_pages_pct_lwm. The goal is to not write back dirty pages too 
> soon.

I tried innodb_buffer_pool_size=2G and innodb_max_dirty_pages_pct=90 with:
- innodb_max_dirty_pages_pct_lwm=0
- innodb_max_dirty_pages_pct_lwm=75
I can't tell if it was faster or not. The difference in speed is too small.

> Making your InnoDB log files as large as possible will also help defer 
> writing back dirty pages and reduce your page write back rate (and write-amp) 
> in a good way.
> | innodb_log_file_size                                   | 524288000

Answered above.




Attachment: InnoDB-2G-48377s.tar.gz
Description: application/gzip

Attachment: InnoDB-4G-43614s.tar.gz
Description: application/gzip

Attachment: TokuDB-2G-45480s.tar.gz
Description: application/gzip

Attachment: TokuDB-4G-48331s.tar.gz
Description: application/gzip

_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Reply via email to