ArielGlenn has uploaded a new change for review. ( 
https://gerrit.wikimedia.org/r/377231 )

Change subject: convert 'other' dump cron jobs into role/profile under dumps
......................................................................

convert 'other' dump cron jobs into role/profile under dumps

[WIP] incomplete, broken, etc etc

This also includes the nfs client and sever setup, since that is
a requirement for any dumps to run.

cron jobs are split into:
* weekly jobs, all listed in one bash script
* daily jobs, all listed in another bash script
* wikidata jobs, all listed in a third bash script

No corn jobs will be activated with this commit; all jobs are commented
out.  Activation would require removing the corresponding code from
the existing snapshot module and will be done one at a time.

This will be split into a separate commit:

while we set up the new modules, we also do some cleanup on the
bash scripts that will run out of cron:
* convert tabs to whitespace
* double quotes and curly brackets around var names as needed
* full paths to executables
* php, gzip, bzip2 paths always read from dumps config file
* use $() instead of backquotes

Bug: T175528

Change-Id: Ib3ed35e886d330e6d9112791b84e44d493ffaaea
---
M hieradata/common.yaml
R modules/dumps/files/htmldumps/nginx.zim.conf
A modules/dumps/files/otherdumps/dailies.sh
A modules/dumps/files/otherdumps/dump_functions.sh
A modules/dumps/files/otherdumps/logrot/logrotate.categoriesrdf
A modules/dumps/files/otherdumps/logrot/logrotate.cirrusdump
A modules/dumps/files/otherdumps/weeklies.sh
A modules/dumps/files/otherdumps/weeklies/create-media-per-project-lists.sh
A modules/dumps/files/otherdumps/weeklies/dump-global-blocks.sh
A modules/dumps/files/otherdumps/weeklies/dumpcategoriesrdf.sh
A modules/dumps/files/otherdumps/weeklies/dumpcirrussearch.sh
A modules/dumps/files/otherdumps/weeklies/dumpcontentxlation.sh
A modules/dumps/files/otherdumps/wikidata-weeklies.sh
A modules/dumps/files/otherdumps/wikidata/dcatconfig.json
A modules/dumps/files/otherdumps/wikidata/dumpwikidatajson.sh
A modules/dumps/files/otherdumps/wikidata/dumpwikidatardf.sh
A modules/dumps/files/otherdumps/wikidata/wikidatadumps-shared.sh
R modules/dumps/files/web/xmldumps/favicon.ico
R modules/dumps/files/web/xmldumps/logrotate.conf
A modules/dumps/manifests/addchangesdumps/README.txt
A modules/dumps/manifests/generation/client/nfs.pp
R modules/dumps/manifests/generation/server/nfs.pp
A modules/dumps/manifests/otherdumps.pp
A modules/dumps/manifests/otherdumps/common.pp
A modules/dumps/manifests/otherdumps/config.pp
A modules/dumps/manifests/otherdumps/daily.pp
A modules/dumps/manifests/otherdumps/daily/mediatitles.pp
A modules/dumps/manifests/otherdumps/weekly.pp
A modules/dumps/manifests/otherdumps/weekly/categoriesrdf.pp
A modules/dumps/manifests/otherdumps/weekly/cirrussearch.pp
A modules/dumps/manifests/otherdumps/weekly/contentxlation.pp
A modules/dumps/manifests/otherdumps/weekly/globalblocks.pp
A modules/dumps/manifests/otherdumps/weekly/mediaperprojectlists.pp
A modules/dumps/manifests/otherdumps/wikidata.pp
A modules/dumps/manifests/otherdumps/wikidata/common.pp
A modules/dumps/manifests/otherdumps/wikidata/json.pp
A modules/dumps/manifests/otherdumps/wikidata/rdf.pp
R modules/dumps/manifests/web/xmldumps.pp
R modules/dumps/manifests/web/zim.pp
A modules/dumps/manifests/xmldumps/README.txt
R modules/dumps/nfs/dirs.pp
R modules/dumps/templates/nfs/default-nfs-common.erb
R modules/dumps/templates/nfs/default-nfs-kernel-server.erb
R modules/dumps/templates/nfs/nfs_exports.erb
A modules/dumps/templates/otherdumps/otherdumps.conf.erb
45 files changed, 1,391 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet 
refs/changes/31/377231/1

diff --git a/hieradata/common.yaml b/hieradata/common.yaml
index 9f8d0d0..eb600b8 100644
--- a/hieradata/common.yaml
+++ b/hieradata/common.yaml
@@ -312,6 +312,12 @@
   - snapshot1005.eqiad.wmnet
   - snapshot1006.eqiad.wmnet
   - snapshot1007.eqiad.wmnet
+dumps_client_config:
+  repodir: /srv/deployment/dumps/dumps/xmldumps-backup
+  confsdir: /etc/dumps/confs
+  nfsmount: /mnt/dumpsdata
+  otherdumpsdir: /mnt/dumpsdata/other
+  apachedir: /srv/mediawiki
 
 # Schemas names that match this regex
 # will not be produced to the eventlogging-valid-mixed
diff --git a/modules/dumps/files/nginx.zim.conf 
b/modules/dumps/files/htmldumps/nginx.zim.conf
similarity index 100%
rename from modules/dumps/files/nginx.zim.conf
rename to modules/dumps/files/htmldumps/nginx.zim.conf
diff --git a/modules/dumps/files/otherdumps/dailies.sh 
b/modules/dumps/files/otherdumps/dailies.sh
new file mode 100644
index 0000000..18b687c
--- /dev/null
+++ b/modules/dumps/files/otherdumps/dailies.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# modules/dumps/otherdumps/dailies.sh
+#############################################################
+
+source /usr/local/bin/dump_functions.sh
+
+#/usr/bin/find ${otherdir}/pagetitles/ -maxdepth 1 -type d -mtime +90 -exec rm 
-rf {} \; ; /usr/bin/find ${otherdir}/mediatitles/ -maxdepth 1 -type d -mtime 
+90 -exec rm -rf {} \;
+#cd ${repodir}; /usr/bin/python onallwikis.py --configfile 
${confsdir}/wikidump.conf.monitor  --filenameformat 
'{w}-{d}-all-titles-in-ns-0.gz' --outdir '${otherdir}/pagetitles/{d}' --query 
"'select page_title from page where page_namespace=0;'"
+#cd ${repodir}; /usr/bin/python onallwikis.py --configfile 
${confsdir}/wikidump.conf.monitor  --filenameformat 
'{w}-{d}-all-media-titles.gz' --out#dir '${otherdir}/mediatitles/{d}' --query 
"'select page_title from page where page_namespace=6;'"
diff --git a/modules/dumps/files/otherdumps/dump_functions.sh 
b/modules/dumps/files/otherdumps/dump_functions.sh
new file mode 100644
index 0000000..858d935
--- /dev/null
+++ b/modules/dumps/files/otherdumps/dump_functions.sh
@@ -0,0 +1,53 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# modules/dumps/otherdumps/dump_functions.sh
+#############################################################
+#
+# functions used by "other" dumps cron jobs (not the main xml/sql ones)
+
+source /usr/local/etc/set_dump_dirs.sh
+
+checkval() {
+    setting=$1
+    value=$2
+    if [ -z "$value" -o "$value" == "null" ]; then
+        echo "failed to retrieve value of $setting from $configfile" >& 2
+        exit 1
+    fi
+}
+
+getsetting() {
+    results=$1
+    section=$2
+    setting=$3
+    echo "$results" | /usr/bin/jq -M -r ".$section.$setting"
+}
+
+
+standard_usage() {
+    echo "Usage: $0 --confsdir <path> --repodir <path> --otherdumpsdir <path>"
+    echo
+    echo "  --confsdir       path to dir with configuration files for dump 
generation"
+    echo "  --repodir        path to dir with scripts for dump generation"
+    echo "  --otherdumpsdir  path to dir where misc dump output files are 
written"
+}
+
+get_standard_opts() {
+    while [ $# -gt 0 ]; do
+        if [ $1 == "--confsdir" ]; then
+               confsdir="$2"
+               shift; shift;
+        elif [ $1 == "--repodir" ]; then
+               repodir="$2"
+               shift; shift;
+        elif [ $1 == "--otherdumpsdir" ]; then
+               otherdumpsdir="$2"
+               shift; shift;
+        else
+               echo "$0: Unknown option $1"
+               standard_usage
+                exit 1
+        fi
+    done
+}
diff --git a/modules/dumps/files/otherdumps/logrot/logrotate.categoriesrdf 
b/modules/dumps/files/otherdumps/logrot/logrotate.categoriesrdf
new file mode 100644
index 0000000..b78d423
--- /dev/null
+++ b/modules/dumps/files/otherdumps/logrot/logrotate.categoriesrdf
@@ -0,0 +1,11 @@
+# This file is managed by puppet
+# puppet:///modules/dumps/otherdumps/logrot/logrotate.categoriesrdf
+#
+/var/log/categoriesrdf/*.log {
+    daily
+    compress
+    delaycompress
+    missingok
+    maxage 22
+    nocreate
+}
diff --git a/modules/dumps/files/otherdumps/logrot/logrotate.cirrusdump 
b/modules/dumps/files/otherdumps/logrot/logrotate.cirrusdump
new file mode 100644
index 0000000..fd033d8
--- /dev/null
+++ b/modules/dumps/files/otherdumps/logrot/logrotate.cirrusdump
@@ -0,0 +1,11 @@
+# This file is managed by puppet
+# puppet:///modules/dumps/otherdumps/logrot/logrotate.cirrusdump
+#
+/var/log/cirrusdump/*.log {
+    daily
+    compress
+    delaycompress
+    missingok
+    maxage 22
+    nocreate
+}
diff --git a/modules/dumps/files/otherdumps/weeklies.sh 
b/modules/dumps/files/otherdumps/weeklies.sh
new file mode 100644
index 0000000..a792b7f
--- /dev/null
+++ b/modules/dumps/files/otherdumps/weeklies.sh
@@ -0,0 +1,16 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# modules/dumps/otherdumps/weeklies.sh
+#############################################################
+
+source /usr/local/bin/dump_functions.sh
+
+#/usr/local/bin/create-media-per-project-lists.sh
+#/usr/local/bin/dumpcirrussearch.sh --config ${confsdir}/wikidump.conf
+#/usr/local/bin/dumpcategoriesrdf.sh --config ${confsdir}/wikidump.conf --list 
${apachedir}/dblists/categories-rdf.dblist
+#/usr/bin/find ${otherdir}/contenttranslation/ -maxdepth 1 -type d -mtime +90 
-exec rm -rf {} \;
+#/usr/local/bin/dumpcontentxlation.sh
+#/usr/bin/find ${otherdir}/globalblocks/ -maxdepth 1 -type d -mtime +90 -exec 
rm -rf {} \;
+#/usr/local/bin/dump-global-blocks.sh --config ${confsdir}/wikidump.conf
+#/usr/local/bin/create-media-per-project-lists.sh
diff --git 
a/modules/dumps/files/otherdumps/weeklies/create-media-per-project-lists.sh 
b/modules/dumps/files/otherdumps/weeklies/create-media-per-project-lists.sh
new file mode 100755
index 0000000..d0278c9
--- /dev/null
+++ b/modules/dumps/files/otherdumps/weeklies/create-media-per-project-lists.sh
@@ -0,0 +1,39 @@
+#!/bin/bash
+
+#############################################################
+# This file is maintained by puppet!
+# modules/dumps/otherdumps/weeklies/create-media-per-project-lists.sh
+#############################################################
+
+source /usr/local/etc/dump_functions.sh
+
+DATE=$( /bin/date '+%Y%m%d' )
+outputdir="${otherdumpsdir}/imageinfo/$DATE"
+configfile="${confsdir}/wikidump.conf.media"
+errors=0
+
+cd "$repodir"
+
+/usr/bin/python "${repodir}/onallwikis.py" --outdir "$outputdir" \
+       --config "$configfile" --nooverwrite \
+       --query "'select img_name, img_timestamp from image;'" \
+       --filenameformat "{w}-{d}-local-wikiqueries.gz"
+if [ $? -ne 0 ]; then
+    echo "failed sql dump of image tables"
+    errors=1
+fi
+
+basewiki=commonswiki
+
+/usr/bin/python "${repodir}/onallwikis.py" --outdir "$outputdir" \
+       --base "$basewiki" \
+       --config "$configfile" --nooverwrite \
+       --query "'select gil_to from globalimagelinks where gil_wiki= 
\"{w}\";'" \
+       --filenameformat "{w}-{d}-remote-wikiqueries.gz"
+
+if [ $? -ne 0 ]; then
+    echo "failed sql dump of globalimagelink tables"
+    errors=1
+fi
+
+exit $errors
diff --git a/modules/dumps/files/otherdumps/weeklies/dump-global-blocks.sh 
b/modules/dumps/files/otherdumps/weeklies/dump-global-blocks.sh
new file mode 100644
index 0000000..35b8cc6
--- /dev/null
+++ b/modules/dumps/files/otherdumps/weeklies/dump-global-blocks.sh
@@ -0,0 +1,120 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# modules/dumps/otherdumps/weeklies/dump-global-blocks.sh
+#############################################################
+
+source /usr/local/etc/dump_functions.sh
+
+get_db_host() {
+    apachedir=$1
+
+    multiversionscript="${apachedir}/multiversion/MWScript.php"
+    if [ -e "$multiversionscript" ]; then
+        host=$( "$php" -q "$multiversionscript" 
extensions/CentralAuth/maintenance/getCentralAuthDBInfo.php --wiki="aawiki" ) 
|| (echo $host >& 2; host="")
+    fi
+    if [ -z "$host" ]; then
+        echo "can't locate db server for centralauth, exiting." >& 2
+        exit 1
+    fi
+    echo $host
+}
+
+get_db_user() {
+    apachedir=$1
+
+    multiversionscript="${apachedir}/multiversion/MWScript.php"
+    if [ -e "$multiversionscript" ]; then
+        db_user=$( echo 'echo $wgDBadminuser;' | "$php" "$multiversionscript" 
eval.php aawiki )
+    fi
+    if [ -z "$db_user" ]; then
+        echo "can't get db user name, exiting." >& 2
+        exit 1
+    fi
+    echo $db_user
+}
+
+get_db_pass() {
+    apachedir=$1
+
+    multiversionscript="${apachedir}/multiversion/MWScript.php"
+    if [ -e "$multiversionscript" ]; then
+        db_pass=$( echo 'echo $wgDBadminpassword;' | "$php" 
"$multiversionscript" eval.php aawiki )
+    fi
+    if [ -z "$db_pass" ]; then
+        echo "can't get db password, exiting." >& 2
+        exit 1
+    fi
+    echo $db_pass
+}
+
+dump_tables() {
+    tables=$1
+    outputdir=$2
+    mysqldump=$3
+    gzip=$4
+    db_user=$5
+    db_pass=$6
+
+    today=$( date +%Y%m%d )
+    dir="$outputdir/$today"
+    mkdir -p "$dir"
+    for t in $tables; do
+        outputfile="${dir}/${today}-${t}.gz"
+        if [ "$dryrun" == "true" ]; then
+            echo "would run:"
+            echo -n "${mysqldump} -u ${db_user} -p${db_pass} -h ${host} --opt 
--quick --skip-add-locks --skip-lock-tables centralauth ${t}"
+            echo  "| ${gzip} > ${outputfile}"
+        else
+            # echo "dumping $t into $outputfile"
+            "$mysqldump" -u "$db_user" -p"$db_pass" -h "$host" \
+                 --opt --quick --skip-add-locks --skip-lock-tables \
+                 centralauth "$t" | "$gzip" > "$outputfile"
+        fi
+    done
+}
+
+usage() {
+    echo "Usage: $0 [--config <pathtofile>] [--dryrun]" >& 2
+    echo >& 2
+    echo "  --config   path to configuration file for dump generation" >& 2
+    echo "             (default value: ${confsdir}/wikidump.conf" >& 2
+    echo "  --dryrun   don't run dump, show what would have been done" >& 2
+    exit 1
+}
+
+configfile="${confsdir}/wikidump.conf"
+dryrun="false"
+
+while [ $# -gt 0 ]; do
+    if [ $1 == "--config" ]; then
+        configfile="$2"
+        shift; shift
+    elif [ $1 == "--dryrun" ]; then
+        dryrun="true"
+        shift
+    else
+        echo "$0: Unknown option $1" >& 2
+        usage
+    fi
+done
+
+args="wiki:dir;tools:gzip,mysqldump,php"
+results=$( /usr/bin/python "${repodir}/getconfigvals.py" --configfile 
"$configfile" --args "$args" )
+
+apachedir=$( getsetting "$results" "wiki" "dir"` ) || exit 1
+gzip=$( getsetting "$results" "tools" "gzip" ) || exit 1
+mysqldump=$( getsetting "$results" "tools" "mysqldump" ) || exit 1
+php=$( getsetting "$results" "tools" "php" ) || exit 1
+
+for settingname in "apachedir" "gzip" "mysqldump"; do
+    checkval "$settingname" "${!settingname}"
+done
+
+outputdir="${otherdumspdir}/globalblocks"
+
+host=`get_db_host "$apachedir"` || exit 1
+db_user=`get_db_user "$apachedir"` || exit 1
+db_pass=`get_db_pass "$apachedir"` || exit 1
+
+dump_tables "globalblocks" "$outputdir" "$mysqldump" "$gzip" "$db_user" 
"$db_pass"
diff --git a/modules/dumps/files/otherdumps/weeklies/dumpcategoriesrdf.sh 
b/modules/dumps/files/otherdumps/weeklies/dumpcategoriesrdf.sh
new file mode 100755
index 0000000..bad0be3
--- /dev/null
+++ b/modules/dumps/files/otherdumps/weeklies/dumpcategoriesrdf.sh
@@ -0,0 +1,134 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# modules/dumps/otherdumps/weeklies/dumpcategoriesrdf.sh
+#############################################################
+#
+# Generate an RDF dump of categories for all wikis in
+# categories-rdf list and remove old ones.
+
+source /usr/local/etc/dump_functions.sh
+
+usage() {
+       echo "Usage: $0 --list wikis.dblist [--config <pathtofile>] [--dryrun]"
+       echo
+       echo "  --config  path to configuration file for dump generation"
+       echo "            (default value: ${confsdir}/wikidump.conf"
+       echo "  --list    file containing list of the wikis to dump"
+       echo "  --dryrun  don't run dump, show what would have been done"
+       exit 1
+}
+
+configfile="${confsdir}/wikidump.conf"
+dryrun="false"
+dumpFormat="ttl"
+dbList="categories-rdf"
+
+while [ $# -gt 0 ]; do
+       if [ $1 == "--config" ]; then
+               configfile="$2"
+               shift; shift;
+       elif [ $1 == "--dryrun" ]; then
+               dryrun="true"
+               shift
+       elif [ $1 == "--list" ]; then
+               dbList="$2"
+               shift; shift;
+       else
+               echo "$0: Unknown option $1"
+               usage
+       fi
+done
+
+if [ -z "$dbList" -o ! -f "$dbList" ]; then
+       echo "Valid wiki list must be specified"
+       echo "Exiting..."
+       exit 1
+fi
+
+if [ ! -f "$configfile" ]; then
+       echo "Could not find config file: $configfile"
+       echo "Exiting..."
+       exit 1
+fi
+
+args="wiki:dir,privatelist;tools:gzip,php;output:public"
+results=$( /usr/bin/python "${repodir}/getconfigvals.py" --configfile 
"$configfile" --args "$args" )
+
+deployDir=$( getsetting "$results" "wiki" "dir" ) || exit 1
+privateList=$( getsetting "$results" "wiki" "privatelist" ) || exit 1
+gzip=$( getsetting "$results" "tools" "gzip" ) || exit 1
+php=$( getsetting "$results" "tools" "php" ) || exit 1
+
+for settingname in "deployDir" "gzip" "privateList" "php"; do
+    checkval "$settingname" "${!settingname}"
+done
+
+today=$( /bin/date +'%Y%m%d' )
+targetDirBase="${otherdumpsdir}/categoriesrdf"
+targetDir="${targetDirBase}/${today}"
+timestampsDir="${targetDirBase}/lastdump"
+multiVersionScript="${deployDir}/multiversion/MWScript.php"
+
+# remove old datasets
+daysToKeep=70
+cutOff=$(( $( /bin/date +%s ) - $(( $daysToKeep + 1 )) * 24 * 3600))
+if [ -d "$targetDirBase" ]; then
+       for folder in $(/bin/ls -d -r "${targetDirBase}/"*); do
+                creationTime=$( /bin/date --utc --date="$(basename $folder )" 
+%s 2>/dev/null)
+                if [ -n "$creationTime" ]; then
+                    if [ "$cutOff" -gt "$creationTime" ]; then
+                       if [ "$dryrun" == "true" ]; then
+                               echo /bin/rm "${folder}/"*".${dumpFormat}.gz"
+                               echo /bin/rmdir "${folder}"
+                       else
+                               /bin/rm -f "${folder}/"*".${dumpFormat}.gz"
+                               /bin/rmdir "${folder}"
+                       fi
+                   fi
+               fi
+       done
+fi
+
+# create todays folder
+if [ "$dryrun" == "true" ]; then
+       echo /bin/mkdir -p "$targetDir"
+       echo /bin/mkdir -p "$timestampsDir"
+else
+       if ! /bin/mkdir -p "$targetDir"; then
+               echo "Can't make output directory: $targetDir"
+               echo "Exiting..."
+               exit 1
+       fi
+       if ! /bin/mkdir -p "$timestampsDir"; then
+               echo "Can't make output directory: $timestampsDir"
+               echo "Exiting..."
+               exit 1
+       fi
+fi
+
+# iterate over configured wikis
+/bin/cat "$dbList" | while read wiki; do
+       # exclude all private wikis
+       if ! /bin/egrep -q "^${wiki}$" "$privateList"; then
+               filename="${wiki}-${today}-categories"
+               targetFile="${targetDir}/${filename}.${dumpFormat}.gz"
+               tsFile="${timestampsDir}/${wiki}-categories.last"
+               if [ "$dryrun" == "true" ]; then
+                       echo "${php} ${multiVersionScript} 
maintenance/dumpCategoriesAsRdf.php --wiki=${wiki} --format=${dumpFormat} 2> 
/var/log/categoriesrdf/${filename}.log | ${gzip} > ${targetFile}"
+               else
+                        "$php" "$multiVersionScript" 
maintenance/dumpCategoriesAsRdf.php \
+                            "--wiki=${wiki}" \
+                            "--format=${dumpFormat}" 2> 
"/var/log/categoriesrdf/${filename}.log" \
+                           | "$gzip" > "$targetFile"
+                       echo "$today" > "$tsFile"
+               fi
+       fi
+done
+
+
+# Maintain a 'latest' symlink always pointing at the most recently completed 
dump
+if [ "$dryrun" == "false" ]; then
+       cd "$targetDirBase"
+       /bin/ln -snf "$today" "latest"
+fi
diff --git a/modules/dumps/files/otherdumps/weeklies/dumpcirrussearch.sh 
b/modules/dumps/files/otherdumps/weeklies/dumpcirrussearch.sh
new file mode 100644
index 0000000..ba81f9c
--- /dev/null
+++ b/modules/dumps/files/otherdumps/weeklies/dumpcirrussearch.sh
@@ -0,0 +1,147 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# modules/dumps/otherdumps/weeklies/dumpcirrussearch.sh
+#############################################################
+#
+# Generate a json dump of cirrussearch indices for all enabled
+# wikis and remove old ones.
+
+source /usr/local/etc/dump_functions.sh
+
+usage() {
+       echo "Usage: $0 [--config <pathtofile>] [--dryrun]"
+       echo
+       echo "  --config  path to configuration file for dump generation"
+       echo "            (default value: ${confsdir}/wikidump.conf"
+       echo "  --dryrun  don't run dump, show what would have been done"
+}
+
+configfile="${confsdir}/wikidump.conf"
+dryrun="false"
+
+while [ $# -gt 0 ]; do
+       if [ $1 == "--config" ]; then
+               configfile="$2"
+               shift; shift;
+       elif [ $1 == "--dryrun" ]; then
+               dryrun="true"
+               shift
+       else
+               echo "$0: Unknown option $1"
+               usage
+       fi
+done
+
+if [ ! -f "$configfile" ]; then
+       echo "Could not find config file: $configfile"
+       echo "Exiting..."
+       exit 1
+fi
+
+args="wiki:dir,dblist,privatelist;tools:gzip,php;output:public"
+results=$( /usr/bin/python "${repodir}/getconfigvals.py" --configfile 
"$configFile" --args "$args" )
+
+deployDir=`getsetting "$results" "wiki" "dir"` || exit 1
+allList=`getsetting "$results" "wiki" "dblist"` || exit 1
+privateList=`getsetting "$results" "wiki" "privatelist"` || exit 1
+gzip=`getsetting "$results" "tools" "gzip"` || exit 1
+php=`getsetting "$results" "tools" "php"` || exit 1
+
+for settingname in "deployDir" "allList" "privateList" "gzip"; do
+    checkval "$settingname" "${!settingname}"
+done
+
+today=$( /bin/date +'%Y%m%d' )
+targetDirBase="${otherdumpsdir}/cirrussearch"
+targetDir="${targetDirBase}/${today}"
+multiVersionScript="${deployDir}/multiversion/MWScript.php"
+
+# remove old datasets
+daysToKeep=70
+cutOff=$(( $( /bin/date +%s ) - $(( $daysToKeep + 1 )) * 24 * 3600))
+if [ -d "$targetDirBase" ]; then
+       for folder in $(/bin/ls -d -r "${targetDirBase}/"*); do
+                creationTime=$( /bin/date --utc --date="$(basename $folder)" 
+%s 2>/dev/null )
+                if [ -n "$creationTime" ]; then
+                    if [ "$cutOff" -gt "$creationTime" ]; then
+                       if [ "$dryrun" == "true" ]; then
+                               echo /bin/rm "${folder}/"*.json.gz
+                               echo /bin/rmdir "$folder"
+                       else
+                               /bin/rm -f "${folder}/"*.json.gz
+                               /bin/rmdir "$folder"
+                       fi
+                   fi
+               fi
+       done
+fi
+
+# create todays folder
+if [ "$dryrun" == "true" ]; then
+       echo /bin/mkdir -p "$targetDir"
+else
+       if ! /bin/mkdir -p "$targetDir"; then
+               echo "Can't make output directory: ${targetDir}"
+               echo "Exiting..."
+               exit 1
+       fi
+fi
+
+# iterate over all known wikis
+cat "$allList" | while read wiki; do
+       # exclude all private wikis
+       if ! /bin/egrep -q "^${wiki}$" "$privateList"; then
+               # most wikis only have two indices
+               types="content general"
+               # commonswiki is special, it also has a file index
+               if [ "$wiki" == "commonswiki" ]; then
+                       types="${types} file"
+               fi
+               # run the dump for each index type
+               for type in $types; do
+                       filename="${wiki}-${today}-cirrussearch-${type}"
+                       targetFile="${targetDir}/${filename}.json.gz"
+                       if [ "$dryrun" == "true" ]; then
+                                echo "${php} ${multiVersionScript} 
extensions/CirrusSearch/maintenance/dumpIndex.php --wiki=${wiki} 
--indexType=${type} 2> /var/log/cirrusdump/cirrusdump-${filename}.log | ${gzip} 
> ${targetFile}"
+                       else
+                               "$php" "$multiVersionScript" \
+                                       
extensions/CirrusSearch/maintenance/dumpIndex.php \
+                                       "--wiki=${wiki}" \
+                                        "--indexType=${type}" \
+                                       2> 
"/var/log/cirrusdump/cirrusdump-${filename}.log" \
+                                       | "$gzip" > "$targetFile"
+                       fi
+               done
+       fi
+done
+
+# dump the metastore index (contains persistent states used by cirrus
+# administrative tasks). This index is cluster scoped and not bound to a
+# particular wiki (we pass --wiki to mwscript because it's mandatory but this
+# option is not used by the script itself)
+clusters="eqiad codfw"
+for cluster in $clusters; do
+       filename="cirrus-metastore-${cluster}-${today}"
+       targetFile="${targetDir}/${filename}.json.gz"
+       if [ "$dryrun" == "true" ]; then
+               echo "${php} ${multiVersionScript} 
extensions/CirrusSearch/maintenance/metastore.php --wiki=metawiki --dump 
--cluster=${cluster} 2>> /var/log/cirrusdump/cirrusdump-${filename}.log | 
${gzip} > ${targetFile}"
+       else
+               "$php" "$multiVersionScript" \
+                       extensions/CirrusSearch/maintenance/metastore.php \
+                       --wiki=metawiki \
+                       --dump \
+                       "--cluster=${cluster}" \
+                       2>> "/var/log/cirrusdump/cirrusdump-${filename}.log" \
+                       | "$gzip" > "$targetFile"
+       fi
+done
+
+
+
+# Maintain a 'current' symlink always pointing at the most recently completed 
dump
+if [ "$dryrun" == "false" ]; then
+       cd "$targetDirBase"
+        /bin/rm -f "current"
+       /bin/ln -s "$today" "current"
+fi
diff --git a/modules/dumps/files/otherdumps/weeklies/dumpcontentxlation.sh 
b/modules/dumps/files/otherdumps/weeklies/dumpcontentxlation.sh
new file mode 100644
index 0000000..b4ed5da
--- /dev/null
+++ b/modules/dumps/files/otherdumps/weeklies/dumpcontentxlation.sh
@@ -0,0 +1,84 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# modules/dumps/otherdumps/weeklies/dumpcontentxlation.sh
+#############################################################
+
+source /usr/local/etc/dump_functions.sh
+
+do_dump() {
+    format=$1
+    plaintext=$2
+    command="${php} ${multiversionscript} ${xlationscript} --wiki enwiki -q 
--split-at 500 --outputdir ${outdir} --compression gzip --format ${format}"
+
+    if [ -n "$plaintext" ]; then
+       command="${command} --plaintext"
+    fi
+
+    if [ "${dryrun}" == "true" ]; then
+        echo "$command"
+    else
+        $command
+    fi
+}
+
+usage() {
+    echo "Usage: $0 [--config <pathtofile>] [--dryrun]"
+    echo
+    echo "  --config   path to configuration file for dump generation"
+    echo "             (default value: ${confsdir}/wikidump.conf"
+    echo "  --dryrun   display dump command instead of running it"
+    exit 1
+}
+
+#####################
+# MAIN
+#####################
+
+configfile="${confsdir}/wikidump.conf"
+dryrun="false"
+
+#####################
+# Get cmdline args
+#####################
+
+while [ $# -gt 0 ]; do
+    if [ $1 == "--config" ]; then
+        configfile="$2"
+        shift; shift
+    elif [ $1 == "--dryrun" ]; then
+        dryrun="true"
+        shift
+    else
+        echo "$0: Unknown option $1"
+        usage
+    fi
+done
+
+#####################
+# Get config settings
+#####################
+
+args="wiki:dir;tools:php"
+results=$( /usr/bin/python "${repodir}/getconfigvals.py" --configfile 
"$configfile" --args "$args" )
+
+apachedir=$( getsetting "$results" "wiki" "dir" ) || exit 1
+php=$( getsetting "$results" "tools" "php" ) || exit 1
+
+for settingname in "apachedir" "php"; do
+    checkval "$settingname" "${!settingname}"
+done
+
+####################
+# Dump
+####################
+
+today=$( /bin/date +%Y%m%d )
+outdir="${otherdumpsdir}/contenttranslation/${today}"
+/bin/mkdir -p "$outdir" || exit 1
+multiversionscript="${apachedir}/multiversion/MWScript.php"
+xlationscript="extensions/ContentTranslation/scripts/dump-corpora.php"
+
+do_dump json
+do_dump json plaintext
+do_dump tmx plaintext
diff --git a/modules/dumps/files/otherdumps/wikidata-weeklies.sh 
b/modules/dumps/files/otherdumps/wikidata-weeklies.sh
new file mode 100644
index 0000000..b80428f
--- /dev/null
+++ b/modules/dumps/files/otherdumps/wikidata-weeklies.sh
@@ -0,0 +1,10 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# modules/dumpscrons/wikidata-weeklies.sh
+#############################################################
+
+source /usr/local/bin/dump_functions.sh
+
+#/usr/local/bin/dumpwikidatajson.sh
+#/usr/local/bin/dumpwikidatardf.sh
diff --git a/modules/dumps/files/otherdumps/wikidata/dcatconfig.json 
b/modules/dumps/files/otherdumps/wikidata/dcatconfig.json
new file mode 100644
index 0000000..bf6a408
--- /dev/null
+++ b/modules/dumps/files/otherdumps/wikidata/dcatconfig.json
@@ -0,0 +1,54 @@
+{
+    "directory": null,
+    "api-enabled": true,
+    "dumps-enabled": true,
+    "uri": "https://www.wikidata.org/about";,
+    "catalog-homepage": "https://www.wikidata.org";,
+    "catalog-issued": "2012-10-30",
+    "catalog-license": "http://creativecommons.org/publicdomain/zero/1.0/";,
+    "catalog-i18n": 
"https://www.wikidata.org/w/index.php?title=MediaWiki:DCAT.json&action=raw";,
+    "keywords": ["data store", "semantic", "knowledgebase", "Wikimedia", "user 
generated content", "UGC", "Wikipedia", "Wikidata"],
+    "themes": ["1428", "441", "2191", "384", "7374"],
+    "publisher": {
+        "name": "Wikimedia Foundation",
+        "homepage": "http://wikimediafoundation.org/";,
+        "email": "[email protected]",
+        "publisherType": "NonProfitOrganisation"
+    },
+    "contactPoint": {
+        "vcardType" : "Organization",
+        "name": "Wikidata information team",
+        "email": "[email protected]"
+    },
+    "ld-info": {
+        "accessURL": "https://www.wikidata.org/entity/";,
+        "mediatype": {
+            "json": "application/json",
+            "n3": "application/n-triples",
+            "rdf": "application/rdf+xml",
+            "ttl": "text/turtle",
+            "html": "text/html"
+        },
+        "license": "http://creativecommons.org/publicdomain/zero/1.0/";
+    },
+    "api-info": {
+        "accessURL": "https://www.wikidata.org/w/api.php";,
+        "mediatype": {
+            "json": "application/json",
+            "xml": "application/xml"
+        },
+        "license": "http://creativecommons.org/publicdomain/zero/1.0/";
+    },
+    "dump-info": {
+        "accessURL": "https://dumps.wikimedia.org/wikidatawiki/entities/$1";,
+        "mediatype": {
+            "json": "application/json",
+            "ttl": "text/turtle"
+        },
+        "compression": {
+            "gzip": "gz",
+            "bzip2": "bz2"
+        },
+        "license": "http://creativecommons.org/publicdomain/zero/1.0/";
+    }
+}
diff --git a/modules/dumps/files/otherdumps/wikidata/dumpwikidatajson.sh 
b/modules/dumps/files/otherdumps/wikidata/dumpwikidatajson.sh
new file mode 100644
index 0000000..5df293c
--- /dev/null
+++ b/modules/dumps/files/otherdumps/wikidata/dumpwikidatajson.sh
@@ -0,0 +1,117 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# puppet:///modules/dumps/otherdumps/wikidata/dumpwikidatajson.sh
+#############################################################
+#
+# Generate a json dump for Wikidata and remove old ones.
+#
+# Marius Hoch < [email protected] >
+
+. /usr/local/bin/wikidatadumps-shared.sh
+
+filename="wikidata-${today}-all"
+targetFileGzip="${targetDir}/${filename}.json.gz"
+targetFileBzip2="${targetDir}/${filename}.json.bz2"
+failureFile=/tmp/dumpwikidatajson-failure
+mainLogFile="/var/log/wikidatadump/dumpwikidatajson-${filename}-main.log"
+
+shards=5
+
+# Try to create the dump (up to three times).
+retries=0
+
+while true; do
+       i=0
+       /bin/rm -f "$failureFile"
+
+       while [ $i -lt $shards ]; do
+               (
+                       set -o pipefail
+                       
errorLog="/var/log/wikidatadump/dumpwikidatajson-${filename}-${i}.log"
+                        "$php" "$multiversionscript" \
+                             
extensions/Wikidata/extensions/Wikibase/repo/maintenance/dumpJson.php \
+                             --wiki wikidatawiki \
+                             --shard "$i" \
+                             --sharding-factor "$shards" \
+                             --snippet 2>> "$errorLog" \
+                            | "$gzip" -9 > "${tempDir}/wikidataJson.${i}.gz"
+                       exitCode=$?
+                       if [ $exitCode -gt 0 ]; then
+                               echo -e "\n\n$( date --iso-8601=minutes )) 
Process for shard ${i} failed with exit code ${exitCode}" >> "$errorLog"
+                               echo 1 > "$failureFile"
+
+                               #  Kill all remaining dumpers and start over.
+                               kill -- -$$
+                       fi
+               ) &
+               let i++
+       done
+
+       wait
+
+       if [ -f "$failureFile" ]; then
+               # Something went wrong, let's clean up and maybe retry. Leave 
logs in place.
+               /bin/rm -f "$failureFile"
+               /bin/rm -f "${tempDir}/wikidataJson."*.gz
+               let retries++
+               echo "($( date --iso-8601=minutes )) Dumping one or more shards 
failed. Retrying." >> "$mainLogFile"
+
+               if [ $retries -eq 3 ]; then
+                       exit 1
+               fi
+
+               # Wait 10 minutes (in case of database problems or other 
instabilities), then try again
+               sleep 600
+               continue
+       fi
+
+       break
+
+done
+
+# Open the json list
+echo '[' | "$gzip" -f > "${tempDir}/wikidataJson.gz"
+
+i=0
+while [ $i -lt $shards ]; do
+       tempFile="${tempDir}/wikidataJson.${i}.gz"
+       if [ ! -f "$tempFile" ]; then
+               echo "${tempFile} does not exist. Aborting." >> "$mainLogFile"
+               exit 1
+       fi
+       fileSize=$( stat --printf="%s" "$tempFile" )
+       if [ $fileSize -lt $( expr 10500000000 / $shards ) ]; then
+               echo "File size of ${tempFile} is only ${fileSize}. Aborting." 
>> "$mainLogFile"
+               exit 1
+       fi
+       /bin/cat "$tempFile" >> "${tempDir}/wikidataJson.gz"
+       /bin/rm "$tempFile"
+       let i++
+       if [ $i -lt $shards ]; then
+               # Shards don't end with commas so add commas to separate them
+               echo ',' | "$gzip" -f >> "${tempDir}/wikidataJson.gz"
+       fi
+done
+
+# Close the json list
+echo -e '\n]' | "$gzip" -f >> "${tempDir}/wikidataJson.gz"
+
+mv "${tempDir}/wikidataJson.gz" "$targetFileGzip"
+
+# Legacy directory (with legacy naming scheme)
+legacyDirectory="${publicDir}/other/wikidata"
+/bin/ln -s "../wikibase/wikidatawiki/${today}/${filename}.json.gz" 
"${legacyDirectory}/${today}.json.gz"
+find "$legacyDirectory" -name '*.json.gz' -mtime + $( expr $daysToKeep + 1 ) 
-delete
+
+# (Re-)create the link to the latest
+/bin/ln -fs "${today}/${filename}.json.gz" 
"${targetDirBase}/latest-all.json.gz"
+
+# Create the bzip2 from the gzip one and update the latest-all.json.bz2 link
+"$gzip" -dc "$targetFileGzip" | "$bzip2" -c > "${tempDir}/wikidataJson.bz2"
+/bin/mv "${tempDir}/wikidataJson.bz2" "$targetFileBzip2"
+/bin/ln -fs "${today}/${filename}.json.bz2" 
"${targetDirBase}/latest-all.json.bz2"
+
+pruneOldDirectories
+pruneOldLogs
+runDcat
diff --git a/modules/dumps/files/otherdumps/wikidata/dumpwikidatardf.sh 
b/modules/dumps/files/otherdumps/wikidata/dumpwikidatardf.sh
new file mode 100755
index 0000000..bdb94f0
--- /dev/null
+++ b/modules/dumps/files/otherdumps/wikidata/dumpwikidatardf.sh
@@ -0,0 +1,127 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# puppet:///modules/dumps/otherdumps/wikidata/dumpwikidatardf.sh
+#############################################################
+#
+# Generate a json dump for Wikidata and remove old ones.
+#
+# Marius Hoch < [email protected] >
+
+. /usr/local/bin/wikidatadumps-shared.sh
+
+declare -A dumpNameToFlavor
+dumpNameToFlavor=(["all"]="full-dump" ["truthy"]="truthy-dump")
+
+dumpName="$1"
+
+if [ -z "$dumpName" ]; then
+       echo "No dump name given."
+       exit 1
+fi
+
+dumpFlavor=${dumpNameToFlavor["$dumpName"]}
+if [ -z "$dumpFlavor" ]; then
+       echo "Unknown dump name: ${dumpName}"
+       exit 1
+fi
+
+dumpFormat=$2
+
+if [[ "$dumpFormat" != "ttl" ]] && [[ "$dumpFormat" != "nt" ]]; then
+       echo "Unknown format: ${dumpFormat}"
+       exit 1
+fi
+
+filename="wikidata-${today}-${dumpName}-BETA"
+targetFileGzip="${targetDir}/${filename}.${dumpFormat}.gz"
+targetFileBzip2="${targetDir}/${filename}.${dumpFormat}.bz2"
+failureFile="/tmp/dumpwikidata${dumpFormat}-${dumpName}-failure"
+mainLogFile="/var/log/wikidatadump/dumpwikidata${dumpFormat}-${filename}-main.log"
+
+shards=5
+
+declare -A dumpNameToMinSize
+# Sanity check: Minimal size we expect each shard of a certain dump to have
+dumpNameToMinSize=(["all"]=`expr 12500000000 / $shards` ["truthy"]=`expr 
7500000000 / $shards`)
+
+# Try to create the dump (up to three times).
+retries=0
+
+while true; do
+       i=0
+       /bin/rm -f "$failureFile"
+
+       while [ $i -lt $shards ]; do
+               (
+                       set -o pipefail
+                       
errorLog="/var/log/wikidatadump/dumpwikidata${dumpFormat}-${filename}-${i}.log"
+                        "$php" $multiversionscript 
extensions/Wikidata/extensions/Wikibase/repo/maintenance/dumpRdf.php \
+                             --wiki wikidatawiki \
+                             --shard "$i" \
+                             --sharding-factor "$shards" \
+                             --format "$dumpFormat" \
+                             --flavor "$dumpFlavor" 2>> "$errorLog" \
+                            | "$gzip" -9 > 
"${tempDir}/wikidata${dumpFormat}-${dumpName}.${i}.gz"
+                       exitCode=$?
+                       if [ $exitCode -gt 0 ]; then
+                               echo -e "\n\n($( date --iso-8601=minutes )) 
Process for shard ${i} failed with exit code ${exitCode}" >> "$errorLog"
+                               echo 1 > "$failureFile"
+
+                               #  Kill all remaining dumpers and start over.
+                               kill -- -$$
+                       fi
+               ) &
+               let i++
+       done
+
+       wait
+
+       if [ -f "$failureFile" ]; then
+               # Something went wrong, let's clean up and maybe retry. Leave 
logs in place.
+               /bin/rm -f "$failureFile"
+               /bin/rm -f "${tempDir}/wikidata${dumpFormat}-${dumpName}.*.gz"
+               let retries++
+               echo "($( date --iso-8601=minutes )) Dumping one or more shards 
failed. Retrying." >> "$mainLogFile"
+
+               if [ $retries -eq 3 ]; then
+                       exit 1
+               fi
+
+               # Wait 10 minutes (in case of database problems or other 
instabilities), then try again
+               sleep 600
+               continue
+       fi
+
+       break
+
+done
+
+i=0
+while [ $i -lt $shards ]; do
+       tempFile="${tempDir}/wikidata$dumpFormat-${dumpName}.${i}.gz"
+       if [ ! -f "$tempFile" ]; then
+               echo "${tempFile} does not exist. Aborting." >> "$mainLogFile"
+               exit 1
+       fi
+       fileSize=$( stat --printf="%s" "$tempFile" )
+       if [ $fileSize -lt ${dumpNameToMinSize["$dumpName"]} ]; then
+               echo "File size of ${tempFile} is only ${fileSize}. Aborting." 
>> "$mainLogFile"
+               exit 1
+       fi
+       /bin/cat "$tempFile" >> 
"${tempDir}/wikidata${dumpFormat}-${dumpName}.gz"
+       /bin/rm "$tempFile"
+       let i++
+done
+
+/bin/mv "${tempDir}/wikidata${dumpFormat}-${dumpName}.gz" "$targetFileGzip"
+/bin/ln -fs "$today/$filename.$dumpFormat.gz" 
"$targetDirBase/latest-$dumpName.$dumpFormat.gz"
+
+"$gzip" -dc "$targetFileGzip" | "$bzip2" -c > 
"${tempDir}/wikidata${dumpFormat}-${dumpName}.bz2"
+/bin/mv "${tempDir}/wikidata${dumpFormat}-${dumpName}.bz2" "$targetFileBzip2"
+/bin/ln -fs "${today}/${filename}.${dumpFormat}.bz2" 
"${targetDirBase}/latest-${dumpName}.${dumpFormat}.bz2"
+
+
+pruneOldDirectories
+pruneOldLogs
+runDcat
diff --git a/modules/dumps/files/otherdumps/wikidata/wikidatadumps-shared.sh 
b/modules/dumps/files/otherdumps/wikidata/wikidatadumps-shared.sh
new file mode 100644
index 0000000..848ea2e
--- /dev/null
+++ b/modules/dumps/files/otherdumps/wikidata/wikidatadumps-shared.sh
@@ -0,0 +1,69 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# modules/dumps/otherdumps/wikidata/wikidatadumps-shared.sh
+#############################################################
+#
+# Shared variable and function declarations for creating Wikidata dumps
+#
+# Marius Hoch < [email protected] >
+
+source /usr/local/etc/dump_functions.sh
+configfile="${confsdir}/wikidump.conf"
+
+today=`date +'%Y%m%d'`
+daysToKeep=70
+
+args="wiki:dir;output:public,temp;tools:php,gzip,bzip2"
+results=`python "${repodir}/getconfigvals.py" --configfile "$configfile" 
--args "$args"`
+
+apacheDir=`getsetting "$results" "wiki" "dir"` || exit 1
+publicDir=`getsetting "$results" "output" "public"` || exit 1
+tempDir=`getsetting "$results" "output" "temp"` || exit 1
+php=`getsetting "$results" "tools" "php"` || exit 1
+gzip=`getsetting "$results" "tools" "gzip"` || exit 1
+bzip2=`getsetting "$results" "tools" "bzip2"` || exit 1
+
+for settingname in "apacheDir" "publicDir" "tempDir" "php" "gzip" "bzip2"; do
+    checkval "$settingname" "${!settingname}"
+done
+
+targetDirBase=$publicDir/other/wikibase/wikidatawiki
+targetDir=$targetDirBase/$today
+
+multiversionscript="${apacheDir}/multiversion/MWScript.php"
+
+# Create the dir for the day: This may or may not already exist, we don't care
+mkdir -p $targetDir
+
+# Remove dump-folders we no longer need (keep $daysToKeep days)
+function pruneOldDirectories {
+       # Just to be sure: If this were empty the below would work on /
+       if [ -z "$targetDirBase" ]; then
+               echo "Empty \$targetDirBase"
+               exit 1
+       fi
+
+       cutOff=$(( `date +%s` - `expr $daysToKeep + 1` * 24 * 3600)) # 
Timestamp from $daysToKeep + 1 days ago
+       foldersToDelete=`ls -d -r $targetDirBase/*` # $targetDirBase is known 
to be non-empty
+       for folder in $foldersToDelete; do
+               # Try to get the unix time from the folder name, if this fails 
we'll just
+               # keep the folder (as it's not a valid date, thus hasn't been 
created by this script).
+               creationTime=$(date --utc --date="$(basename $folder)" +%s 
2>/dev/null)
+               if [ -n "$creationTime" ] && [ "$cutOff" -gt "$creationTime" ]; 
then
+                       /bin/rm -rf $folder
+               fi
+       done
+}
+
+function pruneOldLogs {
+       # Remove old logs (keep 35 days)
+       find /var/log/wikidatadump/ -name 'dumpwikidata*-*-*.log' -mtime +36 
-delete
+}
+
+function runDcat {
+        "$php" /usr/local/share/dcat/DCAT.php \
+                --config=/usr/local/etc/dcatconfig.json \
+                "--dumpDir=${targetDirBase}" \
+                "--outputDir=${targetDirBase}"
+}
diff --git a/modules/dumps/files/favicon.ico 
b/modules/dumps/files/web/xmldumps/favicon.ico
similarity index 100%
rename from modules/dumps/files/favicon.ico
rename to modules/dumps/files/web/xmldumps/favicon.ico
Binary files differ
diff --git a/modules/dumps/files/logrotate.conf 
b/modules/dumps/files/web/xmldumps/logrotate.conf
similarity index 86%
rename from modules/dumps/files/logrotate.conf
rename to modules/dumps/files/web/xmldumps/logrotate.conf
index 4ec0848..7f185aa 100644
--- a/modules/dumps/files/logrotate.conf
+++ b/modules/dumps/files/web/xmldumps/logrotate.conf
@@ -1,5 +1,6 @@
 # logrotate config for datasets web logs
 # This file is managed by Puppet
+# modules/dumps/web/xmldumps/logrotate.conf
 
 /var/log/nginx/*.log
 {
diff --git a/modules/dumps/manifests/addchangesdumps/README.txt 
b/modules/dumps/manifests/addchangesdumps/README.txt
new file mode 100644
index 0000000..880ea49
--- /dev/null
+++ b/modules/dumps/manifests/addchangesdumps/README.txt
@@ -0,0 +1,2 @@
+placeholder, the relevant pieces of the snapshot and datasets modules
+will end up here.
diff --git a/modules/dumps/manifests/generation/client/nfs.pp 
b/modules/dumps/manifests/generation/client/nfs.pp
new file mode 100644
index 0000000..19e4280
--- /dev/null
+++ b/modules/dumps/manifests/generation/client/nfs.pp
@@ -0,0 +1,22 @@
+class dumps::generation::client::nfs {
+    require_package('nfs-common')
+
+    file { [ '/mnt/dumpsdata' ]:
+        ensure => 'directory',
+    }
+
+    $dumpsdataserver = $::site ? {
+        'eqiad' => 'dumpsdata1001.eqiad.wmnet',
+        default => 'dumpsdata1001.eqiad.wmnet',
+    }
+
+    mount { '/mnt/data':
+        ensure   => 'mounted',
+        device   => "${dumpsdataserver}:/data",
+        fstype   => 'nfs',
+        name     => '/mnt/dumpsdata',
+        options  => 'bg,hard,tcp,rsize=8192,wsize=8192,intr,nfsvers=3',
+        require  => File['/mnt/dumpsdata'],
+        remounts => false,
+    }
+}
diff --git a/modules/dumpsnfs/manifests/init.pp 
b/modules/dumps/manifests/generation/server/nfs.pp
similarity index 84%
rename from modules/dumpsnfs/manifests/init.pp
rename to modules/dumps/manifests/generation/server/nfs.pp
index 45a2afd..19ef17b 100644
--- a/modules/dumpsnfs/manifests/init.pp
+++ b/modules/dumps/manifests/generation/server/nfs.pp
@@ -10,7 +10,7 @@
         mode    => '0444',
         owner   => 'root',
         group   => 'root',
-        content => template('dumpsnfs/nfs_exports.erb'),
+        content => template('dumps/nfs/nfs_exports.erb'),
         require => Package['nfs-kernel-server'],
     }
 
@@ -29,7 +29,7 @@
         mode    => '0444',
         owner   => 'root',
         group   => 'root',
-        content => template('dumpsnfs/default-nfs-common.erb'),
+        content => template('dumps/nfs/default-nfs-common.erb'),
         require => Package['nfs-kernel-server'],
     }
 
@@ -37,7 +37,7 @@
         mode    => '0444',
         owner   => 'root',
         group   => 'root',
-        content => template('dumpsnfs/default-nfs-kernel-server.erb'),
+        content => template('dumps/nfs/default-nfs-kernel-server.erb'),
         require => Package['nfs-kernel-server'],
     }
 
diff --git a/modules/dumps/manifests/otherdumps.pp 
b/modules/dumps/manifests/otherdumps.pp
new file mode 100644
index 0000000..cd9ebeb
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps.pp
@@ -0,0 +1,37 @@
+class dumps::otherdumps {
+    # fixme only vars needed should get passed
+    # fixme put requires in where needed
+
+    class {'::dumps::otherdumps::config':
+        user => $user,
+        confsdir => $confsdir,
+        repodir => $repodir,
+        otherdumpsdir => $otherdumpsdir,
+        apachedir => $apachedir,
+        dumpdatadir => $dumpdatadir,
+    }
+    class {'::dumps::otherdumps::daily':
+        user => $user,
+        confsdir => $confsdir,
+        repodir => $repodir,
+        otherdumpsdir => $otherdumpsdir,
+        apachedir => $apachedir,
+        dumpdatadir => $dumpdatadir,
+    }
+    class {'::dumpscrons::otherdumps::weekly':
+        user => $user,
+        confsdir => $confsdir,
+        repodir => $repodir,
+        otherdumpsdir => $otherdumpsdir,
+        apachedir => $apachedir,
+        dumpdatadir => $dumpdatadir,
+    }
+    class {'::dumps::otherdumps::wikidata':
+        user => $user,
+        confsdir => $confsdir,
+        repodir => $repodir,
+        otherdumpsdir => $otherdumpsdir,
+        apachedir => $apachedir,
+        dumpdatadir => $dumpdatadir,
+    }
+}
diff --git a/modules/dumps/manifests/otherdumps/common.pp 
b/modules/dumps/manifests/otherdumps/common.pp
new file mode 100644
index 0000000..9fd4322
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/common.pp
@@ -0,0 +1,10 @@
+class dumps::otherdumps::common {
+    file { '/usr/local/etc/dump_functions.sh':
+        ensure => 'present',
+        path   => '/usr/local/etc/dump_functions.sh',
+        mode   => '0755',
+        owner  => 'root',
+        group  => 'root',
+        source => 'puppet:///modules/dumps/otherdumps/dump_functions.sh',
+    }
+}
diff --git a/modules/dumps/manifests/otherdumps/config.pp 
b/modules/dumps/manifests/otherdumps/config.pp
new file mode 100644
index 0000000..cb000c8
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/config.pp
@@ -0,0 +1,21 @@
+class dumps::otherdumps::config (
+    $confsdir = undef,
+    $dumpdatadir = undef,
+    $apachedir = undef,
+) {
+    file { "${confsdir}":
+        ensure  => 'directory',
+        path    => "${confsdir}/otherdumps.conf",
+        mode    => '0755',
+        owner   => 'root',
+        group   => 'root',
+    }
+    file { "${confsdir}/otherdumps.conf":
+        ensure  => 'present',
+        path    => "${confsdir}/otherdumps.conf",
+        mode    => '0755',
+        owner   => 'root',
+        group   => 'root',
+        content => template('dumps/otherdumps/otherdumps.conf.erb'),
+    }
+}
diff --git a/modules/dumps/manifests/otherdumps/daily.pp 
b/modules/dumps/manifests/otherdumps/daily.pp
new file mode 100644
index 0000000..0951f3f
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/daily.pp
@@ -0,0 +1,29 @@
+class dumps::otherdumps::daily (
+    $user = undef,
+    $confsdir = undef,
+    $repodir = undef,
+    $otherdumpsdir = undef,
+)  {
+    include dumps:otherdumps::daily::mediaperprojectlists
+
+    file { '/usr/local/bin/otherdumps-dailies.sh':
+        ensure => 'present',
+        path   => '/usr/local/bin/otherdumps-dailies.sh',
+        mode   => '0755',
+        owner  => 'root',
+        group  => 'root',
+        source => 'puppet:///modules/dumps/otherdumps/dailies.sh',
+    }
+
+    cron { 'otherdumps-dailies':
+        ensure      => 'present',
+        environment => '[email protected]',
+        user        => $user,
+        command     => "/usr/local/bin/otherdumps-dailies.sh --confsdir 
$confsdir --repodir $repodir --otherdumpsdir $otherdumpsdir",
+        minute      => '10',
+        hour        => '6',
+        weekday     => '0',
+        require     => File['/usr/local/bin/otherdumps-dailies.sh'],
+    }
+
+}
diff --git a/modules/dumps/manifests/otherdumps/daily/mediatitles.pp 
b/modules/dumps/manifests/otherdumps/daily/mediatitles.pp
new file mode 100644
index 0000000..3db5908
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/daily/mediatitles.pp
@@ -0,0 +1,12 @@
+class dumps::otherdumps::daily::mediaperprojectlists(
+    $user = undef,
+) {
+    file { '/usr/local/bin/create-media-per-project-lists.sh':
+        ensure => 'present',
+        path   => '/usr/local/bin/create-media-per-project-lists.sh',
+        mode   => '0755',
+        owner  => 'root',
+        group  => 'root',
+        source => 
'puppet:///modules/dumps/otherdumps/dailies/create-media-per-project-lists.sh',
+    }
+}
diff --git a/modules/dumps/manifests/otherdumps/weekly.pp 
b/modules/dumps/manifests/otherdumps/weekly.pp
new file mode 100644
index 0000000..427f07c
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/weekly.pp
@@ -0,0 +1,48 @@
+class dumps::otherdumps::weekly(
+    $user = undef,
+    $confsdir = undef,
+    $repodir = undef,
+    $otherdumpsdir = undef,
+) {
+    class {'::dumps::otherdumps::weekly::categoriesrdf':
+        user => $user,
+    }
+    class {'::dumps::otherdumps::weekly::cirrussearch':
+        user => $user,
+    }
+    class {'::dumps::otherdumps::weekly::contentxlation':
+        user => $user,
+    }
+    class {'::dumps::otherumps::weekly::globalblocks':
+        user => $user,
+    }
+    class {'::dumps::otherdumps::weekly::mediaperprojectlists':
+        user => $user,
+    }
+
+    file { '/usr/local/bin/otherdumps-weeklies.sh':
+        ensure  => 'present',
+        path    => '/usr/local/bin/otherdumps-weeklies.sh',
+        mode    => '0755',
+        owner   => 'root',
+        group   => 'root',
+        source  => 'puppet:///modules/dumps/otherdumps/otherdumps-weeklies.sh',
+        require => Class[':dumpscrons::weekly::categoriesrdf',
+                         '::dumpscrons::weekly::cirrussearch',
+                         '::dumpscrons::weekly::contentxlation',
+                         '::dumpscrons::weekly::globalblocks',
+                         '::dumpscrons::weekly::mediaperprojectlists'],
+    }
+
+    cron { 'otherdumps-weeklies':
+        ensure      => 'present',
+        environment => '[email protected]',
+        user        => $user,
+        command     => '/usr/local/bin/otherdumps-weeklies.sh --confsdir 
$confsdir --repodir $repodir --otherdumpsdir $otherdumpsdir',
+        minute      => '10',
+        hour        => '6',
+        weekday     => '0',
+        require     => File['/usr/local/bin/otherdumps-weeklies.sh'],
+    }
+
+}
diff --git a/modules/dumps/manifests/otherdumps/weekly/categoriesrdf.pp 
b/modules/dumps/manifests/otherdumps/weekly/categoriesrdf.pp
new file mode 100644
index 0000000..9e0c0f5
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/weekly/categoriesrdf.pp
@@ -0,0 +1,21 @@
+class dumps::otherdumps::weekly::categoriesrdf (
+    $user = undef,
+) {
+    file { '/var/log/categoriesrdf':
+        ensure => 'directory',
+        mode   => '0644',
+        owner  => $user,
+    }
+
+    logrotate::conf { 'categoriesrdf':
+        ensure => present,
+        source => 
'puppet:///modules/dumps/otherdumps/logrot/logrotate.categoriesrdf',
+    }
+
+    file { '/usr/local/bin/dumpcategoriesrdf.sh':
+        mode   => '0755',
+        owner  => 'root',
+        group  => 'root',
+        source => 
'puppet:///modules/dumps/otherdumps/weeklies/dumpcategoriesrdf.sh',
+    }
+}
diff --git a/modules/dumps/manifests/otherdumps/weekly/cirrussearch.pp 
b/modules/dumps/manifests/otherdumps/weekly/cirrussearch.pp
new file mode 100644
index 0000000..253210e
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/weekly/cirrussearch.pp
@@ -0,0 +1,21 @@
+class dumps::otherdumps::weekly::cirrussearch(
+    $user = undef,
+) {
+    file { '/var/log/cirrusdump':
+        ensure => 'directory',
+        mode   => '0644',
+        owner  => $user,
+    }
+
+    logrotate::conf { 'cirrusdump':
+        ensure => present,
+        source => 
'puppet:///modules/dumps/otherdumps/logrot/logrotate.cirrusdump',
+    }
+
+    file { '/usr/local/bin/dumpcirrussearch.sh':
+        mode   => '0755',
+        owner  => 'root',
+        group  => 'root',
+        source => 
'puppet:///modules/dumps/otherdumps/weeklies/dumpcirrussearch.sh',
+    }
+}
diff --git a/modules/dumps/manifests/otherdumps/weekly/contentxlation.pp 
b/modules/dumps/manifests/otherdumps/weekly/contentxlation.pp
new file mode 100644
index 0000000..2f6c6f5
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/weekly/contentxlation.pp
@@ -0,0 +1,10 @@
+class dumps::otherdumps::weekly::contentxlation(
+    $user = undef,
+) {
+    file { '/usr/local/bin/dumpcontentxlation.sh':
+        mode   => '0755',
+        owner  => 'root',
+        group  => 'root',
+        source => 
'puppet:///modules/dumps/otherdumps/weeklies/dumpcontentxlation.sh',
+    }
+}
diff --git a/modules/dumps/manifests/otherdumps/weekly/globalblocks.pp 
b/modules/dumps/manifests/otherdumps/weekly/globalblocks.pp
new file mode 100644
index 0000000..d7fbcc9
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/weekly/globalblocks.pp
@@ -0,0 +1,10 @@
+class dumps::otherdumps::weekly::globalblocks(
+    $user = undef,
+) {
+    file { '/usr/local/bin/dump-global-blocks.sh':
+        mode   => '0755',
+        owner  => 'root',
+        group  => 'root',
+        source => 
'puppet:///modules/dumps/otherdumps/weeklies/dump-global-blocks.sh',
+    }
+}
diff --git a/modules/dumps/manifests/otherdumps/weekly/mediaperprojectlists.pp 
b/modules/dumps/manifests/otherdumps/weekly/mediaperprojectlists.pp
new file mode 100644
index 0000000..178c7c9
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/weekly/mediaperprojectlists.pp
@@ -0,0 +1,12 @@
+class dumps::otherdumps::weekly::mediaperprojectlists(
+    $user = undef,
+) {
+    file { '/usr/local/bin/create-media-per-project-lists.sh':
+        ensure => 'present',
+        path   => '/usr/local/bin/create-media-per-project-lists.sh',
+        mode   => '0755',
+        owner  => 'root',
+        group  => 'root',
+        source => 
'puppet:///modules/dumps/otherdumps/weeklies/create-media-per-project-lists.sh',
+    }
+}
diff --git a/modules/dumps/manifests/otherdumps/wikidata.pp 
b/modules/dumps/manifests/otherdumps/wikidata.pp
new file mode 100644
index 0000000..fb9d16d
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/wikidata.pp
@@ -0,0 +1,40 @@
+class dumps::otherdumps::wikidata(
+    $user = undef,
+    $confsdir = undef,
+    $repodir = undef,
+    $otherdumpsdir = undef,
+) {
+    class {'::dumps::otherdumps::wikidata::common':
+        user => $user,
+    }
+    class {'::dumps::otherdumps::wikidata::json':
+        user => $user,
+    }
+    class {'::dumps::otherdumps::wikidata::rdf':
+        user => $user,
+    }
+
+    file { '/usr/local/bin/wikidata-weeklies.sh':
+        ensure  => 'present',
+        path    => '/usr/local/bin/wikidata-weeklies.sh',
+        mode    => '0755',
+        owner   => 'root',
+        group   => 'root',
+        source  => 'puppet:///modules/dumps/otherdumps/wikidata-weeklies.sh',
+        require => Class[':dumps::otherdumps::wikidata::common',
+                         '::dumps::otherdumps::wikidata::json',
+                         '::dumps::otherdumps::wikidata::rdf'],
+    }
+
+    cron { 'otherdumps-weeklies':
+        ensure      => 'present',
+        environment => '[email protected]',
+        user        => $user,
+        command     => "/usr/local/bin/wikidata-weeklies.sh --confsdir 
$confsdir --repodir $repodir --otherdumpsdir $otherdumpsdir",
+        minute      => '10',
+        hour        => '6',
+        weekday     => '0',
+        require     => File['/usr/local/bin/wikidata-weeklies.sh'],
+    }
+
+}
diff --git a/modules/dumps/manifests/otherdumps/wikidata/common.pp 
b/modules/dumps/manifests/otherdumps/wikidata/common.pp
new file mode 100644
index 0000000..172923d
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/wikidata/common.pp
@@ -0,0 +1,31 @@
+class dumps::otherdumps::wikidata::common {
+    file { '/usr/local/bin/wikidatadumps-shared.sh':
+        mode   => '0755',
+        owner  => 'root',
+        group  => 'root',
+        source => 
'puppet:///modules/dumps/otherdumps/wikidata/wikidatadumps-shared.sh',
+    }
+
+    file { '/var/log/wikidatadump':
+        ensure => 'directory',
+        mode   => '0755',
+        owner  => $user,
+        group  => 'www-data',
+    }
+
+    git::clone { 'DCAT-AP':
+        ensure    => 'present', # Don't automatically update.
+        directory => '/usr/local/share/dcat',
+        origin    => 'https://gerrit.wikimedia.org/r/operations/dumps/dcat',
+        branch    => 'master',
+        owner     => $user,
+        group     => 'www-data',
+    }
+
+    file { '/usr/local/etc/dcatconfig.json':
+        mode   => '0644',
+        owner  => 'root',
+        group  => 'root',
+        source => 
'puppet:///modules/dumps/otherdumps/wikidata/dcatconfig.json',
+    }
+}
diff --git a/modules/dumps/manifests/otherdumps/wikidata/json.pp 
b/modules/dumps/manifests/otherdumps/wikidata/json.pp
new file mode 100644
index 0000000..5ed7604
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/wikidata/json.pp
@@ -0,0 +1,14 @@
+class dumps::otherdumps::wikidata::json(
+    $user   = undef,
+) {
+    # nope, requires the user param. ugh
+    include ::dumps::otherdumps::wikidata::common
+
+    file { '/usr/local/bin/dumpwikidatajson.sh':
+        mode    => '0755',
+        owner   => 'root',
+        group   => 'root',
+        source  => 
'puppet:///modules/dumps/otherdumps/wikidata/dumpwikidatajson.sh',
+        require => Class['dumpscrons::wikidata::common'],
+    }
+}
diff --git a/modules/dumps/manifests/otherdumps/wikidata/rdf.pp 
b/modules/dumps/manifests/otherdumps/wikidata/rdf.pp
new file mode 100644
index 0000000..9bfa238
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/wikidata/rdf.pp
@@ -0,0 +1,14 @@
+class dumps::otherdumps::wikidata::rdf(
+    $user   = undef,
+) {
+    # nope needs 'user' param. ugh
+    include ::dumpscrons::wikidata::common
+
+    file { '/usr/local/bin/dumpwikidatardf.sh':
+        mode    => '0755',
+        owner   => 'root',
+        group   => 'root',
+        source  => 
'puppet:///modules/dumps/otherdumps/wikidata/dumpwikidatardf.sh',
+        require => Class['dumps::otherdumps::wikidata::::common'],
+    }
+}
diff --git a/modules/dumps/manifests/init.pp 
b/modules/dumps/manifests/web/xmldumps.pp
similarity index 100%
rename from modules/dumps/manifests/init.pp
rename to modules/dumps/manifests/web/xmldumps.pp
diff --git a/modules/dumps/manifests/zim.pp b/modules/dumps/manifests/web/zim.pp
similarity index 100%
rename from modules/dumps/manifests/zim.pp
rename to modules/dumps/manifests/web/zim.pp
diff --git a/modules/dumps/manifests/xmldumps/README.txt 
b/modules/dumps/manifests/xmldumps/README.txt
new file mode 100644
index 0000000..880ea49
--- /dev/null
+++ b/modules/dumps/manifests/xmldumps/README.txt
@@ -0,0 +1,2 @@
+placeholder, the relevant pieces of the snapshot and datasets modules
+will end up here.
diff --git a/modules/dumpsdirs/manifests/init.pp b/modules/dumps/nfs/dirs.pp
similarity index 100%
rename from modules/dumpsdirs/manifests/init.pp
rename to modules/dumps/nfs/dirs.pp
diff --git a/modules/dumpsnfs/templates/default-nfs-common.erb 
b/modules/dumps/templates/nfs/default-nfs-common.erb
similarity index 100%
rename from modules/dumpsnfs/templates/default-nfs-common.erb
rename to modules/dumps/templates/nfs/default-nfs-common.erb
diff --git a/modules/dumpsnfs/templates/default-nfs-kernel-server.erb 
b/modules/dumps/templates/nfs/default-nfs-kernel-server.erb
similarity index 100%
rename from modules/dumpsnfs/templates/default-nfs-kernel-server.erb
rename to modules/dumps/templates/nfs/default-nfs-kernel-server.erb
diff --git a/modules/dumpsnfs/templates/nfs_exports.erb 
b/modules/dumps/templates/nfs/nfs_exports.erb
similarity index 100%
rename from modules/dumpsnfs/templates/nfs_exports.erb
rename to modules/dumps/templates/nfs/nfs_exports.erb
diff --git a/modules/dumps/templates/otherdumps/otherdumps.conf.erb 
b/modules/dumps/templates/otherdumps/otherdumps.conf.erb
new file mode 100644
index 0000000..582589c
--- /dev/null
+++ b/modules/dumps/templates/otherdumps/otherdumps.conf.erb
@@ -0,0 +1,22 @@
+#############################################################
+# This file is maintained by puppet!
+# modules/dumspcrons/otherdumps.conf.erb
+#############################################################
+
+[wiki]
+dblist=<%= @apachedir -%>/dblists/all.dblist
+privatelist=<%= @apachedir -%>/dblists/private.dblist
+dir=<%= @apachedir -%>
+
+[output]
+public=<%= @dumpdatadir -%>/public
+temp=<%= @dumpdatadir -%>/temp
+
+[tools]
+php=/usr/bin/php5
+#php=/usr/bin/php
+mysql=/usr/bin/mysql
+mysqldump=/usr/bin/mysqldump
+gzip=/bin/gzip
+bzip2=/bin/bzip2
+sevenzip=/usr/bin/7za

-- 
To view, visit https://gerrit.wikimedia.org/r/377231
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib3ed35e886d330e6d9112791b84e44d493ffaaea
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: ArielGlenn <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to