Marian Hettwer wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Daniel,
Daniel Ouellet wrote:
Marian Hettwer wrote:
As soon as replication starts, mysql gets very unresponsive:
- -bash-3.1$ time mysqladmin -uroot -p proc stat
Enter password:
+----+-------------+-----------+----+---------+------+-----------------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State
| Info |
+----+-------------+-----------+----+---------+------+-----------------------------------------------------------------------+------------------+
| 4 | system user | | | Connect | 204 | Waiting for
master to send event |
|
| 5 | system user | | | Connect | 8661 | Has read all
relay log; waiting for the slave I/O thread to update it |
|
| 7 | root | localhost | | Query | 0 |
| show processlist |
+----+-------------+-----------+----+---------+------+-----------------------------------------------------------------------+------------------+
Uptime: 308 Threads: 1 Questions: 6328 Slow queries: 0 Opens: 0
Flush tables: 1 Open tables: 24 Queries per second avg: 20.545
real 0m15.463s
user 0m0.010s
sys 0m0.020s
15 bloody seconds to return mysqladmin proc stat ?
That ain't good.
Wasn't it that your slave actually catch up to the master and replicate
all the tables your master had?
well, not all tables, but quite a lot of them. Some are ignored. See the
my.cnf I provided.
You don't provide mysql.err logs, etc and we don't know if it actually
replicate your tables or not. I guess from this it did.
It is replicating the tables I have. my mysql.err file looks like that:
060915 17:33:29 mysqld started
060915 17:33:29 [Warning] /usr/local/libexec/mysqld: ignoring option
'--low-priority-updates' due to invalid value 'ON'
060915 17:33:29 [Warning] /usr/local/libexec/mysqld: ignoring option
'--low-priority-updates' due to invalid value 'ON'
- --> Seems like that parameter doesn't exist anymore in MySQL 5.0 ...
I'll look into it...
Starting by looking at errors and then making sure a replication setup
doesn't have any errors is always a good thing before saying it doesn't
work. So, when no errors happen, may be many things will work just fine.
060915 17:33:29 [Warning] Could not increase number of max_open_files to
more than 8096 (request: 8192)
- --> You mentioned something about that later in your mail. Could be a
problem, eh?
Go read it again. I think I pointed it many times so far. I could give
you more to work with, but not at this time. If the first reading, and
seconds didn't help yet, obviously there is still something to learn and
you will always remember it if you dig it up first. But go read the page
on the net again about it, see emails I answer not to long ago about
that too and even do the very simple test I said to do to test this
problem. I don't think I could spell it better.
If you still have issue with this, I will be glad to point you in the
right direction, but do your homework first and try it out. The answer
was provided very clearly and repeated as well and IS in the document
about it as well.
May be my English is the problem there, but the informations is there.
Read a few more times if needed.
060915 17:33:29 [Warning] mysql.user table is not updated to new
password format; Disabling new password usage until
mysql_fix_privilege_tables is run
- --> Yeah well, I could run mysql_fix_privilege_tables, however, I bet it
has nothing todo with my problem.
That's not fix privilege. Men, go read please. Look for old_password.
060915 17:33:29 [Warning] Can't open and lock time zone table: Table
'mysql.time_zone_leap_second' doesn't exist trying to live without them
060915 17:33:29 [Warning] Neither --relay-log nor --relay-log-index were
used; so replication may break when this MySQL server acts as a slave
and has his hostname changed!! Please use
'--relay-log=babelfish45-relay-bin' to avoid this problem.
- --> As I'm not about to change the hostname, I'll fix that problem later.
That is not the host name here. Go read the manual. They tell you to
configure the my.cnf to use a log file reflecting your host name, not to
change your host name. I think spending some time reading will help you
work on the software you want to use. This is well explain in the log as
well as in the manual.
They even tell you what to use:
--relay-log=babelfish45-relay-bin
Where does it say hostname needs to be changed?
060915 17:33:29 [Note] /usr/local/libexec/mysqld: ready for connections.
Version: '5.0.22-log' socket: '/tmp/mysql.sock' port: 3306 OpenBSD
port: mysql-server-5.0.22
060915 17:33:29 [Note] Slave SQL thread initialized, starting
replication in log 'foo-bin.000040' at position 358083515, relay log
'./babelfish45-relay-bin.000004' position: 37101832
060915 17:33:29 [Note] Slave I/O thread: connected to master
'[EMAIL PROTECTED]:3306', replication started in log 'foo-bin.000040' at
position 358083543
Look lie at a minimum this works.
Let see 308 seconds up only for the server, did 20.5 query per seconds
for that time with would be your 6328 queries there, of witch all finish
based on this show process and also looks like it finish to mirror it
and now is waiting for the master to send more.
That's right. And according to the queries per second it's continuesly
getting data from its master (approx. 49 queries per second, all through
replication)
Uptime: 231027 Threads: 1 Questions: 11540813 Slow queries: 0 Opens:
0 Flush tables: 1 Open tables: 11 Queries per second avg: 49.954
Good then.
I have no clue how big your database might be or not. Nor how many
tables, etc.
all in all it's 175 MyISAM files, but only a small part of them are
actually open and in use.
As you see above, only 11 tables are open. But some of them are rather
large (400 - 600 MB).
But look like form previous errors that it try to use table that are not
available. So, if you really want a good mirror, you need to make sure
it will replicate all the tables it needs, or are link together, or the
replication process will stop, only the bin log files will keep growing.
Clue on that is if you have more then one relay-bin file on the slave,
then it is safe to assume the replications stop. Not the copy over of
the data, but the update of the tables.
The only think I know is that you did install from packages. Great. Then
started master/slave and look like it worked.
And as I said, access to MySQL itself is pretty slow.
As in: getting a "show slave status\G" needs between 4 and 14 seconds,
or a "mysqladmin proc stat" needs up to 16 seconds.
And this has really nothing to do with "how big is your database" or
"how many open tables do you have". Not at all.
I don't think I said it was normal. That's why I asked for the error
logs first. See, may be it's just me, but before I say something is not
working properly or is slow, etc. I make dam sure I have no errors
before I clam that. If I see errors, then I correct my problem until I
see none and then when all look good to me, no errors, etc and I still
have a problem, then research it, read the doc again and try a few
things and if all failed, then I asked why it might be.
Look to me there is still plenty of place that work can be done before
saying it's not working.
Then you were trying to query the server I guess for data may be, but it
was up only for 5 minutes and based on the query listed, etc. It was
really busy to mirror the data from the master to the new slave. Wasn't
it what it was doing here?
It's still doing exactly that. IO_THREAD gets data from master,
SQL_THREAD is working at nearly 50 queries / second to write all the
data it got.
May be tables are lock, may be IO are slow, may be drives are slow, may
be the drivers for the network card you use is not working well, may be
limits are reach ( hint about open table above), may be errors are
slowing the process down, may be errors on new password format for each
query on slave slow down the process instead of using the old_password
standard that was changed between version here.... Get it?
So, are you expecting to have all the data ready as soon as you start
MySQL on a slave. Look like you were expecting it to be ready right
away? I don't know but I know this. To query data, it has to be there
first.
Oh it is there. Besides, I used a tarball to get a starting point to
start the replication. This tarball is probably 1 hour behind in
replication. So I should have seen a replication drift of approx 1 hour.
Which I did. And while the box is replicating and trying to catch up, of
course I should be able to connect to MySQL and do some queries.
Sure you can. I my case, I don't even start with a tarball. I turn up
new slave and let them catch up from a long time ago. I do have dump
regularly for backup, but I don't even bother to do that for a new slave
and it catch up very quickly, well many hours, but it does catch up no
problem and does work very well as well.
To me look like the box was up 308 seconds and started to mirror the
source, updated that databases/tables, may be creating the index as it
goes, I don't know your data see, and may be some of your data was
requesting the table to be lock when updated instead of may be insert
delay or something like that.
The statement I was talking about was "show slave status" or "show
processlist". Both statements have nothing to do with locked tables or
the like.
And you sure can get the results quickly as well:
mysql> show processlist;
+-----+-------------+-----------+------+---------+---------+-----------------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State
| Info
|
+-----+-------------+-----------+------+---------+---------+-----------------------------------------------------------------------+------------------+
| 1 | system user | | NULL | Connect | 8596289 | Waiting for
master to send event | NULL
|
| 141 | system user | | NULL | Connect | 0 | Has read
all relay log; waiting for the slave I/O thread to update it | NULL
|
| 187 | root | localhost | NULL | Query | 0 | NULL
| show
processlist |
+-----+-------------+-----------+------+---------+---------+-----------------------------------------------------------------------+------------------+
3 rows in set (0.00 sec)
But now that's it's been up for 216000 seconds, how does it work?
Dunno. Sometimes I have doubts about how MySQL is calculating the time
in a replication setup.
Reminds me of checking that my OpenBSD box here has it's ntpd running
and is getting the correct time :-)
Always a good thing to check and have ntpd running, but MySQL look at
the time it was up and running.
Is your data mirror well or not. Any error in mysql.err file or not.
Responsive to query or not?
my data mirror? You're talking about the slave (the OpenBSD box), not
the Master (some linux box), right?
Yes I am. I could care less about the master, as long as your replicator
accounts have the rights it needs on the master and that the files are
available and your slave is setup properly and you don't have version
conflict like old/new password, or reverse the version between
master/slave where 5.x will NOT work on 4.x and a few other things, you
are good.
Well, according to .err it's doing okay...
Well, you still have many errors in there. So, start by cleaning them up.
I just wonder if you expected it to be all mirror and ready as soon as
you issue the start slave?
You probably haven't read my first email correctly. Once again, the
"mysqladmin proc status" is taking very very long. This has nothing to
do with catching up with the master. I do know that it'll take some time :)
Start by getting ride of all the errors you have and then will see what
might be next.
I do know MySQL quite well (MySQL 4.1 in fact) and for the OpenBSD
Installation I followed the guidelines at www.openbsdsupport.org, which
was basicly just increasing the kern.maxfiles and changing
/etc/login.conf
There was more then that, including to make sure you start the daemon
with the class as well, etc.
That's true... so let's see wether I did a mistake there...
- -bash-3.1$ ps aux | grep mysqld
_mysql 18287 1.8 4.2 189292 88972 p0 D Fri05PM 131:53.49
/usr/local/libexec/mysqld --basedir=/usr/local --datadir=/var/mysql
- --user=_mysql --pid-file=/var/mysql/babelfish45.mobile.rz.pid --open-files
mhettwer 26116 1.0 0.0 372 468 p1 S+ 9:52AM 0:00.01 grep
mysqld
root 30015 0.0 0.0 488 472 p0 I Fri05PM 0:00.08
/bin/sh /usr/local/bin/mysqld_safe
looks like I didn't. the mysqld is running as _mysql ...
Go back and read it again and also the email I sent on this too. Running
under a user, doesn't mean it is also wihtout a dought that it is ALSO
using that class for it.
Many reference to man page there and explications provided as well.
Just a shout hint for you. Class are use on login. Does _mysql login in
when you start it? hmmmm... For answer, look at /etc/master.passwd.
Specially the end of the line there.
An other hint. Try su _mysql, do you get a shell? Hmmm...
Then if you don't specify it, will it use the class you think it is
using? hmm....
Exercise left for the user.
All answers already provided and in the document as well.
http://crivens.terrorteam.de/~rabauke/OpenBSD/MySQL/my.cnf.txt
Ah, I nearly forgot:
- -bash-3.1$ sysctl kern.maxfiles
kern.maxfiles=8096
Did you notice that the suggeted configuration have double the allow
files in the kernel oppose to the my.cnf configuration?
There is a reason for that. Se the top of the document said that when
you open about 29 tables, you will get the error 9. However you see 64
in the default limits right? Why you think that is? MySQL documentations
does explain that mysql always open two files minimum per tables in most
cases, so may be mysql should rewrite the meaning of max_open_files in
the configurations. But anyway, simple rule of thumb. But twice the
number of files limits in sysctl as you put in my.cnf, not the same as
you do here.
Thanks for pointing out. I'll fix that right now.
open-files-limit is now at 4096
Good, but also check how you start MySQL as well ok?
Not that I think you hit that limit here as you didn't say anything
about error 9, but should one day start to have a lots of tables and
come close to this limits, then you will not know why that is.
I will know now...
Good, but still check how you started MySQL ok?
Don't forget that the man page is clear on that subject the class is use
on login, but here the daemon doesn't login, see master.passwd. You
start it manually from your shell, it will use the default class, not
the _mysql class, unless you expressly tell it to do so.
This is almost every time overlooked.
Many thinks that by defining the class, daemon will use it.
But when you start the daemon are you that user? No, you are not, so it
can't use that class then, but will use the default class. Just a word
of caution on this part as well if you didn't do it too.
Proof.
# su _mysql
This account is currently not available.
So, it can't then use the _mysql class when you start it, unless you do
specify it.
su -c _mysql root -c '/usr/local/bin/mysqld_safe &' > /dev/null &
It looks like doing a "sudo mysqld_safe" as my user has the same effect.
mysqld is running as _mysql
Although mysqld_safe is running as root. hm hm...
I'll better go with the correct way, changing to _mysql. Thanks for
pointing out.
You are welcome. See details above for using of class on login.
Hope this help you some.
Well let's see.
But thanks so far..
Let see when no errors are present in the near future.
Best,
Daniel