Thank you for all your answers. They really helped me to speed up my setup.
Indeed innodb_buffer_pool_size is important in my case, as you suggested, but I don't really know why. Unless direct io is used, innodb_buffer_pool_size is redundant with kernel FS cache. So if MySQL does not compete with another software, it would only save syscalls, not IO access. In fact, I didn't even plan to configure innodb_buffer_pool_size at all. For my use, FS cache would have been fast enough. So why is innodb_buffer_pool_size so important when importing so much data ? This is only intuition but the only explanation I see is that it would buffer index updates: IOW, secondary indexes would not be updated on disk immediately on commit (not important for Durability if this can be recovered after a crash). If I'm right, is there any other settings to make MariaDB even more aggressive about this ? Here are results for different values of innodb_buffer_pool_size: 128M (default) -> 67 hours 1G -> 25 hours 2G -> 13.4 hours 4G -> 12.1 hours All other settings being identical. Attached files are full information for 2G & 4G (it looks like for 4G, I showed status before MariaDB finished background tasks). I think the CPU is fast enough to process everything in less than 5 hours. By switching from USB HDD to SSD, I should reach this goal :) With only 1 client, I tried innodb_buffer_pool_instances=1 but that didn't change anything. Le 04/26/15 21:39, jocelyn fournier a écrit : > You should try to enable innodb_file_per_table (and recreate the table), to > avoid filling the ibdata file. Unless Barracuda is used, I don't think this affects performance. I tested and it was not faster.h > Try also to increase the innodb_log_buffer_size to 16M (your Innodb_log_waits > is increasing). Ah, this helped. Given a non-negligible number of transactions exceeds 16M, I even increased it to 128M. > How much memory do you have on your system ? The innodb_buffer_pool_size of > 1G seems to be a little bit low. 12G RAM > Depending on your usage, you could also benefit from switching to TokuDB > (fast insert and much better compression) It was not easy to test on this machine which has too old gcc (gcc >= 4.7 is required). So I used another MySQL server inside a Debian chroot. Here are results for different values of tokudb_cache_size: 1G -> 24.8 hours 2G -> 12.6 hours 4G -> 13.4 hours > Try also to use the Barracuda file_format instead of Antelope : > innodb_file_format = Barracuda & innodb_file_format_max = Barracuda. This means changing the CREATE TABLE requests in the software and I prefer to spend time on evaluating TokuDB. > For such a big amount of data to import, you could also try to increase the > innodb_log_file_size. I did 1 test with 1G instead of 500M (and innodb_buffer_pool_size=2G) and it didn't change anything. Le 04/27/15 18:09, MARK CALLAGHAN a écrit : > For the fsync rate, my memory is vague, but I think InnoDB respects the > write-ahead-log rule even with innodb_flush_method = nosync and this means > that before a dirty page is written back the log must be forced for any > changes to that page. If you run strace you can determine which files get the > fsync and if it is only the log then my memory is correct. Since my first email, I didn't do any other test with unsafe settings, but IIRC, there were also fsync to ibdata. Isn't it visible in 'show status' ? | Innodb_data_fsyncs | 334552 | | Innodb_os_log_fsyncs | 94311 | > I almost always use innodb_flush_method=O_DIRECT and then configure enough > background write threads. Exceptionally for the import, I could try O_DIRECT. During normal operation, we never use this option because with several services on the same machine, it's too difficult to find the best value of innodb_buffer_pool_size, and we prefer to let the kernel caches IO for all processes. > For write amplification you are computing the rate as Innodb-write-rate / > InnoDB-read-rate. No I computed the read rate of the source DB (which is not a SQL DB) vs the write rate of mysqld. But you're right on the fact that reading the source DB does not read any secondary index. > But I think your InnoDB-read-rate is the scans of the PK index while the > write rate includes IO for PK and secondary indexes. This is still write-amp, > but deserves an asterisk to explain that point. Write-amp from the > doublewrite buffer can be much cheaper than write-amp from dirty page > writeback where "cheaper" means fewer seeks. > > If you want to make the load go faster than creating the secondary indexes > after the table has been loaded might help. > > Another interesting question is whether to load in parallel or one table at a > time. Sometimes, loading one table at a time helps because you are more > likely to keep the working set in memory during the load. This can be very > important when secondary indexes are maintained during the load as you can do > too much random read - modify - write for secondary indexes, although the > InnoDB change buffer helps a bit with that. The choice here isn't all > parallel vs 1-at-a-time. Doing a few at a time gives some parallel benefit, > but also gives each table load a better chance at keeping secondary indexes > in memory during the load. Good to know but that would make the import algorithm too complex. I think we reach a point where buying more RAM is cheaper. > It might help to increase innodb_max_dirty_pages_pct to 90 (you used 75). I > don't have current experience with it but you might also want to increase > innodb_max_dirty_pages_pct_lwm. The goal is to not write back dirty pages too > soon. I tried innodb_buffer_pool_size=2G and innodb_max_dirty_pages_pct=90 with: - innodb_max_dirty_pages_pct_lwm=0 - innodb_max_dirty_pages_pct_lwm=75 I can't tell if it was faster or not. The difference in speed is too small. > Making your InnoDB log files as large as possible will also help defer > writing back dirty pages and reduce your page write back rate (and write-amp) > in a good way. > | innodb_log_file_size | 524288000 Answered above.
InnoDB-2G-48377s.tar.gz
Description: application/gzip
InnoDB-4G-43614s.tar.gz
Description: application/gzip
TokuDB-2G-45480s.tar.gz
Description: application/gzip
TokuDB-4G-48331s.tar.gz
Description: application/gzip
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp

