On Thu, Sep 15, 2016 at 5:36 AM, Andrew Smith <asmit...@littlesvr.ca> wrote:
> Hello
> I administer a relatively busy wiki for our school, at
> https://wiki.cdot.senecacollege.ca
> In august I migrated the wiki from an older server where it had as far as I
> can tell 1.15.4 on. The new server had the latest mediawiki verison
> available, as far as I can tell now 1.27
> The database server didn't change, only the web server.
> After updating the mediawiki files (I started from 1.27 and added in missing
> stuff as described here https://www.mediawiki.org/wiki/Manual:Moving_a_wiki
> ) I ran update.php
> I can't remember if it printed any errors the first time I ran it. There was
> a lot of output and I don't understand half of it. I think it ran without
> errors. Also I ran it multiple times since then and haven't noticed any
> errors.
> Everything seemed to work. But now that the new semester started and we have
> a lot of new students - we discovered a very serious problem: the wiki won't
> allow new user registrations.
> You can go and try yourself. Usually the interface just hangs there, never
> coming back with a response. Sometimes I get an error like this:
> MySQL [cdotwiki_db]> SELECT user_id FROM `mw_user` WHERE user_name =
> ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
> Just now I finished migrating the database to a brand new MySQL 5.7.15
> server, thinking that maybe I would see some change. But nothing changed.
> Because it's 1AM I got some debugging going, and almost certainly this was
> the first Mediawiki request to the database that failed (from SHOW ENGINE
> ---TRANSACTION 4748, ACTIVE 62 sec
> 8 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 8
> MySQL thread id 1345, OS thread handle 140329361524480, query id 20771
> web-cdot1.sparc cdotwiki_usr cleaning up
> Trx read view will not see trx with id >= 4742, sees < 4742
> TABLE LOCK table `cdotwiki_db`.`mw_user` trx id 4748 lock mode IS
> RECORD LOCKS space id 70 page no 157 n bits 624 index user_name of table
> `cdotwiki_db`.`mw_user` trx id 4748 lock mode S locks gap before rec
> Record lock, heap no 244 PHYSICAL RECORD: n_fields 2; compact format; info
> bits 0
>  0: len 6; hex 41736f583139; asc AsoX19;;
>  1: len 4; hex 000001ac; asc     ;;
> Record lock, heap no 558 PHYSICAL RECORD: n_fields 2; compact format; info
> bits 0
>  0: len 8; hex 41736d6974683230; asc Asmith20;;
>  1: len 4; hex 000036c2; asc   6 ;;
> TABLE LOCK table `cdotwiki_db`.`mw_user` trx id 4748 lock mode IX
> RECORD LOCKS space id 70 page no 436 n bits 112 index PRIMARY of table
> `cdotwiki_db`.`mw_user` trx id 4748 lock_mode X locks rec but not gap
> Record lock, heap no 42 PHYSICAL RECORD: n_fields 17; compact format; info
> bits 0
>  0: len 4; hex 000036c2; asc   6 ;;
>  1: len 6; hex 00000000128c; asc       ;;
>  2: len 7; hex 21000001362118; asc !   6! ;;
>  3: len 8; hex 41736d6974683230; asc Asmith20;;
>  4: len 0; hex ; asc ;;
>  5: len 30; hex
> 3a70626b6466323a7368613235363a31303030303a3132383a2f66545962; asc
> :pbkdf2:sha256:10000:128:/fTYb; (total 222 bytes);
>  6: len 0; hex ; asc ;;
>  7: len 21; hex 61736d6974683230406c6974746c657376722e6361; asc
> asmit...@littlesvr.ca;;
>  8: len 14; hex 3230313630393135303530303137; asc 20160915050017;;
>  9: len 30; hex
> 623061346535323762613365336462656133323035633666343564663163; asc
> b0a4e527ba3e3dbea3205c6f45df1c; (total 32 bytes);
>  10: SQL NULL;
>  11: len 30; hex
> 396561386335613365663263623666353062303736646165393934393331; asc
> 9ea8c5a3ef2cb6f50b076dae994931; (total 32 bytes);
>  12: len 14; hex 3230313630393232303530303130; asc 20160922050010;;
>  13: len 14; hex 3230313630393135303530303130; asc 20160915050010;;
>  14: SQL NULL;
>  15: len 4; hex 80000000; asc     ;;
>  16: SQL NULL;
> TABLE LOCK table `cdotwiki_db`.`mw_watchlist` trx id 4748 lock mode IX
> RECORD LOCKS space id 70 page no 157 n bits 624 index user_name of table
> `cdotwiki_db`.`mw_user` trx id 4748 lock_mode X locks rec but not gap
> Record lock, heap no 558 PHYSICAL RECORD: n_fields 2; compact format; info
> bits 0
>  0: len 8; hex 41736d6974683230; asc Asmith20;;
>  1: len 4; hex 000036c2; asc   6 ;;
> TABLE LOCK table `cdotwiki_db`.`mw_logging` trx id 4748 lock mode IX
> TABLE LOCK table `cdotwiki_db`.`mw_recentchanges` trx id 4748 lock mode IX
> If I kill this transaction (thread id 1345) via phpmyadmin - strangely
> nothing will appear to happen. My browser will keep spinning, as if it's
> waiting for a response.
> I am starting to become desperate. With every day that passes the problem
> grows. And I don't have a lot of ideas left.
> It's not the MySQL server. I disabled nearly all extensions. I can't really
> read the output from innodb status (example above). What should I do?
> My best guess now is that the upgrade didn't work 100%. But what do I do
> now? Is it an option to downgrade? If yes - how far back? Will that even
> help or is the database corrupt? How can I check whether it's corrupt?
> If I have to roll back the the backup from before the upgrade (shiver) -
> will it be possible to apply the changes made to the new database to the
> restored old version? Probably not?
> Please help! I thought I got this upgrade to work and I really hope I didn't
> screw hundreds of users over :(
> Cheers,
> Andrew
The upgrade only partially done would probably not cause this - If the
schema change wasn't made, you should get an instance error about
missing columns. (However, if the schema change (ALTER TABLE) was
still in progress, maybe you might have it blocking everything going
to the user table. You could maybe use SHOW processlist; to verify
that that is not the case. However if it was the case I feel like a
whole lot of other stuff would be broken too).

Its difficult to downgrade MW when going so many versions back. You'd
probably be able to downgrade from 1.27 -> 1.26. But going all the way
back to 1.15 would be extremely difficult.

I don't know much about DB locking, so the following might be stupid and wrong:

I was under the impression that mysql would add a line like "-------
transaction is actually waiting for a lock. So the output of show
innodb engine status you pasted above makes me almost think that its
not so much mediawiki is waiting for a lock as MW is holding a bunch
of locks, and for some reason never sending a commit (This might be
the totally wrong conclusion, I don't really know anything about mysql
locks). So maybe something is wrong on the php side. Thus I'd also
check the php error log, and maybe try enabling the MediaWiki debug
log and check that.

As an aside, maybe double check the mysql isolation level and make
sure its only at REPEATABLE-READ. (No idea if that would help or not).


