ArielGlenn has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/325945 )

Change subject: cleanup of README for general configuration and sample config 
file
......................................................................


cleanup of README for general configuration and sample config file

* whitespace cleanup
* get rid of unused config options halt, forcenormal, perdumpindex in docs
  and/or sample config
* add docs for stubs options, per-wiki config
* get rid of dead dblists in sample config, add tabledocs option
* add other standard options to sample config to fill it out some

Bug: T152679
Change-Id: I6a80ecdee474449d200979fca2ffe0839f45f0a4
---
M xmldumps-backup/doc/README.config
M xmldumps-backup/samples/wikidump.conf.sample
2 files changed, 97 insertions(+), 63 deletions(-)

Approvals:
  ArielGlenn: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/xmldumps-backup/doc/README.config 
b/xmldumps-backup/doc/README.config
index 2e17831..d4d76f4 100644
--- a/xmldumps-backup/doc/README.config
+++ b/xmldumps-backup/doc/README.config
@@ -24,14 +24,14 @@
 
 ===Structure of a configuration file
 
-Each section of the configuration file starts with a name in brackets, with 
+Each section of the configuration file starts with a name in brackets, with
 no leading spaces. For example:
 
 [wiki]
 
 This would introduce the options related to the wikis that are processed.
 
-The following sections are recognized and must be present, even if no 
+The following sections are recognized and must be present, even if no
 configuration options are provided for the section:
 
 wiki, output, reporting, database, tools, cleanup, chunks
@@ -43,51 +43,44 @@
 
 The wiki section accepts the following configuration options:
 
-dblist -- File with list of all databases for which dumps will be generated
-                      Default value: none
-skipdblist -- ... except for the ones in this file. (This is a bit odd;
-               why not just list the ones you want and be done with it? 
-               Because the WMF list is generated automatically and used
-               for other things, so it is not feasible to remove dbs
-               from it by hand and still keep it in sync as new projects 
-               are created.)
-                      Default value: none
-privatelist -- File with list of databases which should have dumps produced 
-               that are put in the "private" dirctory.  At WMF this means 
-               wikis that are not publically readable by the world.
-                      Default value: none
-flaggedrevslist -- File with list of databases which have flagged revisions 
-               enabled.  (Really, we should be able to determine this 
-               another way instead of keeping a separate list, right?)
-wikidatalist    -- File with list of databases which act as a wikibase
-               repo. For Wikimedia projects this currently consists
-               of the project 'wikidata'.
-globalusagelist -- File with list of databases which act as a media
-               repo with the GlobalUsage extension. For Wikimedia projects
-               this currently consists of the project 'commons'.
-biglist -- File with list of large wikis for which no history dumps are 
-               generated because they are too huge. (This must be an old 
-               deprecated option; these days we do not care how big they 
-               are, we dump them anyways.)
-                      Default value: none
-dir -- Full path to the root directory of the MediaWiki installation for which
-                        dumps are produced.  This assumes one installation for 
-                multiple wikis, nd therefore one LocalSettings.php or 
+dblist      -- File with list of all databases for which dumps will be 
generated
+                       Default value: none
+skipdblist  -- ... except for the ones in this file. (This is a bit odd;
+                why not just list the ones you want and be done with it?
+                Because the WMF list is generated automatically and used
+                for other things, so it is not feasible to remove dbs
+                from it by hand and still keep it in sync as new projects
+                are created.)
+                       Default value: none
+privatelist -- File with list of databases which should have dumps produced
+                that are put in the "private" dirctory.  At WMF this means
+                wikis that are not publically readable by the world.
+                       Default value: none
+flowlist    -- File with list of databases which have the Flow extension
+                enabled on them; these will have Flow page content dumped.
+                       Default value: none
+dir         -- Full path to the root directory of the MediaWiki installation
+                        for which dumps are produced.  This assumes one 
installation
+                for multiple wikis, nd therefore one LocalSettings.php or
                 equivalent that covers all the projects. At WMF this is done
-                by having the files InitialiseSetttings.php and 
+                by having the files InitialiseSetttings.php and
                 CommonSettings.php which have various if stanzas depending
                 on what it enabled on specific projects.
-                      Default value: none
-halt -- what does this do?
-              Default value: 0
+                Per-wiki configuration of this option can be done in separate
+                sections, as described later.
+                       Default value: none
+tablejobs   -- Full path to the yaml file describing the tables to be dumped
+                 via mysql for each wiki.  It is fine to add tables here that
+                 do not exist on all wikis; table existence will be checked
+                 before a dump is attempted.
 
 Of those options, the following are required:
 ...
 
 
 === Output section
-public -- full path to directory under which all dumps will be created, 
-                      in subdirectories named for the name of the database 
+public -- full path to directory under which all dumps will be created,
+                      in subdirectories named for the name of the database
               (wikiproject) being dumped, in subdirectories by date
                       Default value: /dumps/public
 private -- full path to directory under which all dumps of private wikis
@@ -98,22 +91,19 @@
 temp -- full path to directory under which temporary files will be created;
               this should not be the same as the public or private directory.
                       Default value: /dumps/temp
-index -- name of the top-level index file for all projects that is 
+index -- name of the top-level index file for all projects that is
               automatically created by the monitoring process
                       Default value: index.html
 webroot -- url to root of the web directory which serves the public files (this
               is simply the web url that gets people to the content in the 
"public"
               directory defined earlier)
                       Default value: http://localhost/dumps
-templatedir -- directory in which various template files such as those for 
mail or 
-              error reports, rss feed updates or the per-project-and-date html 
files 
+templatedir -- directory in which various template files such as those for 
mail or
+              error reports, rss feed updates or the per-project-and-date html 
files
               are found
                       Default value: home
-perdumpindex -- name of the index file created for a dump for a given project
-              on a given date
-                      Default value: index.html
 
-The above options do not have to be specified in the config file, 
+The above options do not have to be specified in the config file,
 since default values are provided.
 
 === Reporting section
@@ -133,7 +123,7 @@
               any more
                       Default value: 3600
 
-The above options do not have to be specified in the config file, 
+The above options do not have to be specified in the config file,
 since default values are provided.
 
 === Database section
@@ -146,7 +136,7 @@
                       config value has.
                       Default value: 16M
 
-The above options do not have to be specified in the config file, 
+The above options do not have to be specified in the config file,
 since default values are provided.
 
 === Tools section
@@ -173,11 +163,11 @@
                       Default value:/bin/grep
 checkforbz2footer -- Location of the checkforbz2footer binary
               This is part of the mwbzutils package.
-              Default value: /usr/local/bin/checkforbz2footer            
+              Default value: /usr/local/bin/checkforbz2footer
 recompressxml -- Location of the recompressxml binary
               Default value: /usr/local/bin/recompressxml
 
-The above options do not have to be specified in the config file, 
+The above options do not have to be specified in the config file,
 since default values are provided.
 
 === Cleanup section
@@ -185,34 +175,34 @@
               removing the oldest one each time a new one is created
                       Default value: 3
 
-The above option does not have to be specified in the config file, 
+The above option does not have to be specified in the config file,
 since a default is provided.
 
 === Chunks section
-chunksEnabled -- buggy. set to any value to enable. Why? Because 
+chunksEnabled -- buggy. set to any value to enable. Why? Because
              any string value counts as "true", even the value...
              "False" :-D
                       Default value: False
 pagesPerChunkHistory
                Set to a comma separated ist of starting page ID nums
-               in order to generate a set of stub files each one 
+               in order to generate a set of stub files each one
                starting from the next pageID.
                Example:
                pagesPerChunkHistory=5000,5000,100000,100000
                This would generate four chunks, containing:
-               1 to 5000, 5001 through 10000, 10001 through 110000, 
+               1 to 5000, 5001 through 10000, 10001 through 110000,
                110001 through end
                Alternatively you can provide one number in which case
                the job will be split into chunks each containing that
                number of pages. Example:
                pagesPerChunkHistory=50000
                This will generate a number of chunks with pages from
-               1 through 50000, 50001 through 100000, 100001 through 
+               1 through 50000, 50001 through 100000, 100001 through
                150000, and so on.
                       Default value: False
 revsPerChunkHistory -- currently disabled, do not use!
                       Default value: False
-pagesPerChunkAbstract -- as pagesPerChunkHistory but for the abstract 
+pagesPerChunkAbstract -- as pagesPerChunkHistory but for the abstract
                generation phase
                       Default value: False
 checkpointTime -- save checkpoints of files containing revision text
@@ -223,12 +213,28 @@
                written, and opening a new file for the next portion
                of the XML output.  This can be useful if you want
                to produce a large number of smaller files as input
-               to XML-crunching scripts, or if you are dumping 
-               a very large wiki which has a tendency to fail 
+               to XML-crunching scripts, or if you are dumping
+               a very large wiki which has a tendency to fail
                somewhere in the middle (*cough*en wikipedia*cough*).
                       Default value: 0 (no checkpoints produced)
 
-The above options do not have to be specified in the config file, 
+The above options do not have to be specified in the config file,
+since default values are provided.
+
+=== Stubs section (i.e.: [stubs])
+orderrevs  -- set to 1 if it is desired that the dump is ordered
+              by revision id within each page
+              Default: 0 (false)
+minpages   -- stubs (revision metadata) are retrieved in smallish
+              (hopefully) resultsets such that the retrieval query
+             for any set is not too slow; specify minimum number
+             of pages for which to retrieve revisions
+              Default: 1
+maxrevs    -- maximum number of revisions to retrieve at one time,
+             subject to the minpages setting
+              Default: 50000
+
+The above options do not have to be specified in the config file,
 since default values are provided.
 
 === Other formats section (i.e.: [otherformats])
@@ -236,5 +242,26 @@
                 compression of pages-articles.
                       Default value: 0 (no multistream files produced)
 
-The above options do not have to be specified in the config file, 
+The above options do not have to be specified in the config file,
 since default values are provided.
+
+=== Per-wiki configuration
+The following settings may be overriden for specific wikis by specifying
+their name (the name of the db in the database) as a section header,
+e.g. [elwiktionary]:
+
+dir
+user
+password
+max_allowed_packet
+orderrevs
+minpages
+maxrevs
+multistream
+chunksEnabled
+jobsperbatch
+pagesPerChunkHistory
+pagesPerChunkAbstract
+chunksForAbstract
+checkpointTime
+recombineHistory
diff --git a/xmldumps-backup/samples/wikidump.conf.sample 
b/xmldumps-backup/samples/wikidump.conf.sample
index f0c1911..6394952 100644
--- a/xmldumps-backup/samples/wikidump.conf.sample
+++ b/xmldumps-backup/samples/wikidump.conf.sample
@@ -4,17 +4,17 @@
 dblist=/home/ariel/src/mediawiki/testing/backup/all.dblist
 skipdblist=/home/ariel/src/mediawiki/testing/backup/skip.dblist
 privatelist=/home/ariel/src/mediawiki/testing/backup/private.dblist
-flaggedrevslist=/home/ariel/src/mediawiki/testing/backup/flagged.dblist
-wikidatalist=/home/ariel/src/mediawiki/testing/backup/wikidata.dblist
-biglist=/home/ariel/src/mediawiki/testing/backup/big.dblist
+flowlist=/home/ariel/src/mediawiki/testing/backup/flow.dblist
 dir=/home/ariel/src/mediawiki/1.16wmf4/phase3
-forcenormal=0
+tablejobs=/home/ariel/srv/mediawiki/testing/backup/tablejobs.yaml
 
 [output]
 public=/home/ariel/src/mediawiki/testing/dumps/public
 private=/home/ariel/src/mediawiki/testing/dumps/private
+temp=/home/ariel/src/mediawiki/testing/dumps/temp
 index=backup-index.html
 webroot=http://localhost/mydumps
+templatedir=/home/ariel/src/mediawiki/testing/dumps/templs
 
 [reporting]
 staleage=3600
@@ -26,6 +26,7 @@
 [database]
 user=root
 password=""
+max_allowed_packet=32M
 
 [tools]
 php=/usr/bin/php
@@ -44,3 +45,9 @@
 chunksEnabled=1
 pagesPerChunkHistory=10000,50000,50000,50000,50000
 pagesPerChunkAbstract=100000,100000
+
+[otherformats]
+multistream=1
+
+[elwikt]
+dir=/var/www/html/elwikt

-- 
To view, visit https://gerrit.wikimedia.org/r/325945
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I6a80ecdee474449d200979fca2ffe0839f45f0a4
Gerrit-PatchSet: 2
Gerrit-Project: operations/dumps
Gerrit-Branch: master
Gerrit-Owner: ArielGlenn <ar...@wikimedia.org>
Gerrit-Reviewer: ArielGlenn <ar...@wikimedia.org>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to