[Wikitech-l] cleaning database of spam

2013-02-26 Thread Petr Bena
Hi, this is more related to mediawiki rather than wikimedia, but this
list is being watched a bit more I guess.

Is there any extension that allows permanent removal of deleted pages
(or eventually selected deleted pages) from database and removal of
blocked users from database?

Imagine you have a mediawiki wiki that has 20 gb database, where
19.99gb of database is spam and indefinitely blocked users. I think
lot of wikis has this problem, making extension to deal with this
would be useful for many small wikis.

What is exact procedure of properly removing page from database so
that it doesn't break anything? What needs to be deleted and in which
order?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] cleaning database of spam

2013-02-26 Thread Platonides
On 26/02/13 11:57, Petr Bena wrote:
 Hi, this is more related to mediawiki rather than wikimedia, but this
 list is being watched a bit more I guess.
 
 Is there any extension that allows permanent removal of deleted pages
 (or eventually selected deleted pages) from database and removal of
 blocked users from database?
 
 Imagine you have a mediawiki wiki that has 20 gb database, where
 19.99gb of database is spam and indefinitely blocked users. I think
 lot of wikis has this problem, making extension to deal with this
 would be useful for many small wikis.
 
 What is exact procedure of properly removing page from database so
 that it doesn't break anything? What needs to be deleted and in which
 order?

maintenance/deleteArchivedRevisions.php permanently removes the content
of deleted pages from the db.

For removing those users, see
http://www.mediawiki.org/wiki/Extension:User_Merge_and_Delete

Also remember that due to the way mysql works, it may not release those
20GB back to the filesystem.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] cleaning database of spam

2013-02-26 Thread Petr Bena
but it will stop incrementing the datafile

On Tue, Feb 26, 2013 at 12:40 PM, Platonides platoni...@gmail.com wrote:
 On 26/02/13 11:57, Petr Bena wrote:
 Hi, this is more related to mediawiki rather than wikimedia, but this
 list is being watched a bit more I guess.

 Is there any extension that allows permanent removal of deleted pages
 (or eventually selected deleted pages) from database and removal of
 blocked users from database?

 Imagine you have a mediawiki wiki that has 20 gb database, where
 19.99gb of database is spam and indefinitely blocked users. I think
 lot of wikis has this problem, making extension to deal with this
 would be useful for many small wikis.

 What is exact procedure of properly removing page from database so
 that it doesn't break anything? What needs to be deleted and in which
 order?

 maintenance/deleteArchivedRevisions.php permanently removes the content
 of deleted pages from the db.

 For removing those users, see
 http://www.mediawiki.org/wiki/Extension:User_Merge_and_Delete

 Also remember that due to the way mysql works, it may not release those
 20GB back to the filesystem.


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] cleaning database of spam

2013-02-26 Thread Jay Ashworth
- Original Message -
 From: Platonides platoni...@gmail.com

  What is exact procedure of properly removing page from database so
  that it doesn't break anything? What needs to be deleted and in
  which order?
 
 maintenance/deleteArchivedRevisions.php permanently removes the
 content of deleted pages from the db.
 
 For removing those users, see
 http://www.mediawiki.org/wiki/Extension:User_Merge_and_Delete
 
 Also remember that due to the way mysql works, it may not release
 those 20GB back to the filesystem.

In particular, to get anything for your trouble, you will probably need
to dump the database, drop it, shut down MySQL and switch it to innodb
tablespace-per-file, turn it back on, and then reload the dump, as I 
recently had to.

This way, at least, once you clean it up, you can do the same dump and 
reload procedure on only one table, not the whole shootin' match.

Cheers,
-- jra
-- 
Jay R. Ashworth  Baylink   j...@baylink.com
Designer The Things I Think   RFC 2100
Ashworth  Associates http://baylink.pitas.com 2000 Land Rover DII
St Petersburg FL USA   #natog  +1 727 647 1274

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] cleaning database of spam

2013-02-26 Thread Petr Bena
yes that's what I do :

On Tue, Feb 26, 2013 at 4:39 PM, Jay Ashworth j...@baylink.com wrote:
 - Original Message -
 From: Platonides platoni...@gmail.com

  What is exact procedure of properly removing page from database so
  that it doesn't break anything? What needs to be deleted and in
  which order?

 maintenance/deleteArchivedRevisions.php permanently removes the
 content of deleted pages from the db.

 For removing those users, see
 http://www.mediawiki.org/wiki/Extension:User_Merge_and_Delete

 Also remember that due to the way mysql works, it may not release
 those 20GB back to the filesystem.

 In particular, to get anything for your trouble, you will probably need
 to dump the database, drop it, shut down MySQL and switch it to innodb
 tablespace-per-file, turn it back on, and then reload the dump, as I
 recently had to.

 This way, at least, once you clean it up, you can do the same dump and
 reload procedure on only one table, not the whole shootin' match.

 Cheers,
 -- jra
 --
 Jay R. Ashworth  Baylink   
 j...@baylink.com
 Designer The Things I Think   RFC 2100
 Ashworth  Associates http://baylink.pitas.com 2000 Land Rover DII
 St Petersburg FL USA   #natog  +1 727 647 1274

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] cleaning database of spam

2013-02-26 Thread Petr Bena
You meant innodb_file_per_table

On Tue, Feb 26, 2013 at 5:39 PM, Petr Bena benap...@gmail.com wrote:
 yes that's what I do :

 On Tue, Feb 26, 2013 at 4:39 PM, Jay Ashworth j...@baylink.com wrote:
 - Original Message -
 From: Platonides platoni...@gmail.com

  What is exact procedure of properly removing page from database so
  that it doesn't break anything? What needs to be deleted and in
  which order?

 maintenance/deleteArchivedRevisions.php permanently removes the
 content of deleted pages from the db.

 For removing those users, see
 http://www.mediawiki.org/wiki/Extension:User_Merge_and_Delete

 Also remember that due to the way mysql works, it may not release
 those 20GB back to the filesystem.

 In particular, to get anything for your trouble, you will probably need
 to dump the database, drop it, shut down MySQL and switch it to innodb
 tablespace-per-file, turn it back on, and then reload the dump, as I
 recently had to.

 This way, at least, once you clean it up, you can do the same dump and
 reload procedure on only one table, not the whole shootin' match.

 Cheers,
 -- jra
 --
 Jay R. Ashworth  Baylink   
 j...@baylink.com
 Designer The Things I Think   RFC 
 2100
 Ashworth  Associates http://baylink.pitas.com 2000 Land Rover 
 DII
 St Petersburg FL USA   #natog  +1 727 647 
 1274

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] cleaning database of spam

2013-02-26 Thread Jay Ashworth
- Original Message -
 From: Petr Bena benap...@gmail.com

 You meant innodb_file_per_table

Yes; I forgot the exact name, and tried (apparently unsuccessfully) to 
make that look as little like an exact parameter as possible.

Happily, the OP runs that way anyway.

Cheers,
-- jra
-- 
Jay R. Ashworth  Baylink   j...@baylink.com
Designer The Things I Think   RFC 2100
Ashworth  Associates http://baylink.pitas.com 2000 Land Rover DII
St Petersburg FL USA   #natog  +1 727 647 1274

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l