ArielGlenn has uploaded a new change for review. (
https://gerrit.wikimedia.org/r/377231 )
Change subject: convert 'other' dump cron jobs into role/profile under dumps
......................................................................
convert 'other' dump cron jobs into role/profile under dumps
[WIP] incomplete, broken, etc etc
This also includes the nfs client and sever setup, since that is
a requirement for any dumps to run.
cron jobs are split into:
* weekly jobs, all listed in one bash script
* daily jobs, all listed in another bash script
* wikidata jobs, all listed in a third bash script
No corn jobs will be activated with this commit; all jobs are commented
out. Activation would require removing the corresponding code from
the existing snapshot module and will be done one at a time.
This will be split into a separate commit:
while we set up the new modules, we also do some cleanup on the
bash scripts that will run out of cron:
* convert tabs to whitespace
* double quotes and curly brackets around var names as needed
* full paths to executables
* php, gzip, bzip2 paths always read from dumps config file
* use $() instead of backquotes
Bug: T175528
Change-Id: Ib3ed35e886d330e6d9112791b84e44d493ffaaea
---
M hieradata/common.yaml
R modules/dumps/files/htmldumps/nginx.zim.conf
A modules/dumps/files/otherdumps/dailies.sh
A modules/dumps/files/otherdumps/dump_functions.sh
A modules/dumps/files/otherdumps/logrot/logrotate.categoriesrdf
A modules/dumps/files/otherdumps/logrot/logrotate.cirrusdump
A modules/dumps/files/otherdumps/weeklies.sh
A modules/dumps/files/otherdumps/weeklies/create-media-per-project-lists.sh
A modules/dumps/files/otherdumps/weeklies/dump-global-blocks.sh
A modules/dumps/files/otherdumps/weeklies/dumpcategoriesrdf.sh
A modules/dumps/files/otherdumps/weeklies/dumpcirrussearch.sh
A modules/dumps/files/otherdumps/weeklies/dumpcontentxlation.sh
A modules/dumps/files/otherdumps/wikidata-weeklies.sh
A modules/dumps/files/otherdumps/wikidata/dcatconfig.json
A modules/dumps/files/otherdumps/wikidata/dumpwikidatajson.sh
A modules/dumps/files/otherdumps/wikidata/dumpwikidatardf.sh
A modules/dumps/files/otherdumps/wikidata/wikidatadumps-shared.sh
R modules/dumps/files/web/xmldumps/favicon.ico
R modules/dumps/files/web/xmldumps/logrotate.conf
A modules/dumps/manifests/addchangesdumps/README.txt
A modules/dumps/manifests/generation/client/nfs.pp
R modules/dumps/manifests/generation/server/nfs.pp
A modules/dumps/manifests/otherdumps.pp
A modules/dumps/manifests/otherdumps/common.pp
A modules/dumps/manifests/otherdumps/config.pp
A modules/dumps/manifests/otherdumps/daily.pp
A modules/dumps/manifests/otherdumps/daily/mediatitles.pp
A modules/dumps/manifests/otherdumps/weekly.pp
A modules/dumps/manifests/otherdumps/weekly/categoriesrdf.pp
A modules/dumps/manifests/otherdumps/weekly/cirrussearch.pp
A modules/dumps/manifests/otherdumps/weekly/contentxlation.pp
A modules/dumps/manifests/otherdumps/weekly/globalblocks.pp
A modules/dumps/manifests/otherdumps/weekly/mediaperprojectlists.pp
A modules/dumps/manifests/otherdumps/wikidata.pp
A modules/dumps/manifests/otherdumps/wikidata/common.pp
A modules/dumps/manifests/otherdumps/wikidata/json.pp
A modules/dumps/manifests/otherdumps/wikidata/rdf.pp
R modules/dumps/manifests/web/xmldumps.pp
R modules/dumps/manifests/web/zim.pp
A modules/dumps/manifests/xmldumps/README.txt
R modules/dumps/nfs/dirs.pp
R modules/dumps/templates/nfs/default-nfs-common.erb
R modules/dumps/templates/nfs/default-nfs-kernel-server.erb
R modules/dumps/templates/nfs/nfs_exports.erb
A modules/dumps/templates/otherdumps/otherdumps.conf.erb
45 files changed, 1,391 insertions(+), 3 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/operations/puppet
refs/changes/31/377231/1
diff --git a/hieradata/common.yaml b/hieradata/common.yaml
index 9f8d0d0..eb600b8 100644
--- a/hieradata/common.yaml
+++ b/hieradata/common.yaml
@@ -312,6 +312,12 @@
- snapshot1005.eqiad.wmnet
- snapshot1006.eqiad.wmnet
- snapshot1007.eqiad.wmnet
+dumps_client_config:
+ repodir: /srv/deployment/dumps/dumps/xmldumps-backup
+ confsdir: /etc/dumps/confs
+ nfsmount: /mnt/dumpsdata
+ otherdumpsdir: /mnt/dumpsdata/other
+ apachedir: /srv/mediawiki
# Schemas names that match this regex
# will not be produced to the eventlogging-valid-mixed
diff --git a/modules/dumps/files/nginx.zim.conf
b/modules/dumps/files/htmldumps/nginx.zim.conf
similarity index 100%
rename from modules/dumps/files/nginx.zim.conf
rename to modules/dumps/files/htmldumps/nginx.zim.conf
diff --git a/modules/dumps/files/otherdumps/dailies.sh
b/modules/dumps/files/otherdumps/dailies.sh
new file mode 100644
index 0000000..18b687c
--- /dev/null
+++ b/modules/dumps/files/otherdumps/dailies.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# modules/dumps/otherdumps/dailies.sh
+#############################################################
+
+source /usr/local/bin/dump_functions.sh
+
+#/usr/bin/find ${otherdir}/pagetitles/ -maxdepth 1 -type d -mtime +90 -exec rm
-rf {} \; ; /usr/bin/find ${otherdir}/mediatitles/ -maxdepth 1 -type d -mtime
+90 -exec rm -rf {} \;
+#cd ${repodir}; /usr/bin/python onallwikis.py --configfile
${confsdir}/wikidump.conf.monitor --filenameformat
'{w}-{d}-all-titles-in-ns-0.gz' --outdir '${otherdir}/pagetitles/{d}' --query
"'select page_title from page where page_namespace=0;'"
+#cd ${repodir}; /usr/bin/python onallwikis.py --configfile
${confsdir}/wikidump.conf.monitor --filenameformat
'{w}-{d}-all-media-titles.gz' --out#dir '${otherdir}/mediatitles/{d}' --query
"'select page_title from page where page_namespace=6;'"
diff --git a/modules/dumps/files/otherdumps/dump_functions.sh
b/modules/dumps/files/otherdumps/dump_functions.sh
new file mode 100644
index 0000000..858d935
--- /dev/null
+++ b/modules/dumps/files/otherdumps/dump_functions.sh
@@ -0,0 +1,53 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# modules/dumps/otherdumps/dump_functions.sh
+#############################################################
+#
+# functions used by "other" dumps cron jobs (not the main xml/sql ones)
+
+source /usr/local/etc/set_dump_dirs.sh
+
+checkval() {
+ setting=$1
+ value=$2
+ if [ -z "$value" -o "$value" == "null" ]; then
+ echo "failed to retrieve value of $setting from $configfile" >& 2
+ exit 1
+ fi
+}
+
+getsetting() {
+ results=$1
+ section=$2
+ setting=$3
+ echo "$results" | /usr/bin/jq -M -r ".$section.$setting"
+}
+
+
+standard_usage() {
+ echo "Usage: $0 --confsdir <path> --repodir <path> --otherdumpsdir <path>"
+ echo
+ echo " --confsdir path to dir with configuration files for dump
generation"
+ echo " --repodir path to dir with scripts for dump generation"
+ echo " --otherdumpsdir path to dir where misc dump output files are
written"
+}
+
+get_standard_opts() {
+ while [ $# -gt 0 ]; do
+ if [ $1 == "--confsdir" ]; then
+ confsdir="$2"
+ shift; shift;
+ elif [ $1 == "--repodir" ]; then
+ repodir="$2"
+ shift; shift;
+ elif [ $1 == "--otherdumpsdir" ]; then
+ otherdumpsdir="$2"
+ shift; shift;
+ else
+ echo "$0: Unknown option $1"
+ standard_usage
+ exit 1
+ fi
+ done
+}
diff --git a/modules/dumps/files/otherdumps/logrot/logrotate.categoriesrdf
b/modules/dumps/files/otherdumps/logrot/logrotate.categoriesrdf
new file mode 100644
index 0000000..b78d423
--- /dev/null
+++ b/modules/dumps/files/otherdumps/logrot/logrotate.categoriesrdf
@@ -0,0 +1,11 @@
+# This file is managed by puppet
+# puppet:///modules/dumps/otherdumps/logrot/logrotate.categoriesrdf
+#
+/var/log/categoriesrdf/*.log {
+ daily
+ compress
+ delaycompress
+ missingok
+ maxage 22
+ nocreate
+}
diff --git a/modules/dumps/files/otherdumps/logrot/logrotate.cirrusdump
b/modules/dumps/files/otherdumps/logrot/logrotate.cirrusdump
new file mode 100644
index 0000000..fd033d8
--- /dev/null
+++ b/modules/dumps/files/otherdumps/logrot/logrotate.cirrusdump
@@ -0,0 +1,11 @@
+# This file is managed by puppet
+# puppet:///modules/dumps/otherdumps/logrot/logrotate.cirrusdump
+#
+/var/log/cirrusdump/*.log {
+ daily
+ compress
+ delaycompress
+ missingok
+ maxage 22
+ nocreate
+}
diff --git a/modules/dumps/files/otherdumps/weeklies.sh
b/modules/dumps/files/otherdumps/weeklies.sh
new file mode 100644
index 0000000..a792b7f
--- /dev/null
+++ b/modules/dumps/files/otherdumps/weeklies.sh
@@ -0,0 +1,16 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# modules/dumps/otherdumps/weeklies.sh
+#############################################################
+
+source /usr/local/bin/dump_functions.sh
+
+#/usr/local/bin/create-media-per-project-lists.sh
+#/usr/local/bin/dumpcirrussearch.sh --config ${confsdir}/wikidump.conf
+#/usr/local/bin/dumpcategoriesrdf.sh --config ${confsdir}/wikidump.conf --list
${apachedir}/dblists/categories-rdf.dblist
+#/usr/bin/find ${otherdir}/contenttranslation/ -maxdepth 1 -type d -mtime +90
-exec rm -rf {} \;
+#/usr/local/bin/dumpcontentxlation.sh
+#/usr/bin/find ${otherdir}/globalblocks/ -maxdepth 1 -type d -mtime +90 -exec
rm -rf {} \;
+#/usr/local/bin/dump-global-blocks.sh --config ${confsdir}/wikidump.conf
+#/usr/local/bin/create-media-per-project-lists.sh
diff --git
a/modules/dumps/files/otherdumps/weeklies/create-media-per-project-lists.sh
b/modules/dumps/files/otherdumps/weeklies/create-media-per-project-lists.sh
new file mode 100755
index 0000000..d0278c9
--- /dev/null
+++ b/modules/dumps/files/otherdumps/weeklies/create-media-per-project-lists.sh
@@ -0,0 +1,39 @@
+#!/bin/bash
+
+#############################################################
+# This file is maintained by puppet!
+# modules/dumps/otherdumps/weeklies/create-media-per-project-lists.sh
+#############################################################
+
+source /usr/local/etc/dump_functions.sh
+
+DATE=$( /bin/date '+%Y%m%d' )
+outputdir="${otherdumpsdir}/imageinfo/$DATE"
+configfile="${confsdir}/wikidump.conf.media"
+errors=0
+
+cd "$repodir"
+
+/usr/bin/python "${repodir}/onallwikis.py" --outdir "$outputdir" \
+ --config "$configfile" --nooverwrite \
+ --query "'select img_name, img_timestamp from image;'" \
+ --filenameformat "{w}-{d}-local-wikiqueries.gz"
+if [ $? -ne 0 ]; then
+ echo "failed sql dump of image tables"
+ errors=1
+fi
+
+basewiki=commonswiki
+
+/usr/bin/python "${repodir}/onallwikis.py" --outdir "$outputdir" \
+ --base "$basewiki" \
+ --config "$configfile" --nooverwrite \
+ --query "'select gil_to from globalimagelinks where gil_wiki=
\"{w}\";'" \
+ --filenameformat "{w}-{d}-remote-wikiqueries.gz"
+
+if [ $? -ne 0 ]; then
+ echo "failed sql dump of globalimagelink tables"
+ errors=1
+fi
+
+exit $errors
diff --git a/modules/dumps/files/otherdumps/weeklies/dump-global-blocks.sh
b/modules/dumps/files/otherdumps/weeklies/dump-global-blocks.sh
new file mode 100644
index 0000000..35b8cc6
--- /dev/null
+++ b/modules/dumps/files/otherdumps/weeklies/dump-global-blocks.sh
@@ -0,0 +1,120 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# modules/dumps/otherdumps/weeklies/dump-global-blocks.sh
+#############################################################
+
+source /usr/local/etc/dump_functions.sh
+
+get_db_host() {
+ apachedir=$1
+
+ multiversionscript="${apachedir}/multiversion/MWScript.php"
+ if [ -e "$multiversionscript" ]; then
+ host=$( "$php" -q "$multiversionscript"
extensions/CentralAuth/maintenance/getCentralAuthDBInfo.php --wiki="aawiki" )
|| (echo $host >& 2; host="")
+ fi
+ if [ -z "$host" ]; then
+ echo "can't locate db server for centralauth, exiting." >& 2
+ exit 1
+ fi
+ echo $host
+}
+
+get_db_user() {
+ apachedir=$1
+
+ multiversionscript="${apachedir}/multiversion/MWScript.php"
+ if [ -e "$multiversionscript" ]; then
+ db_user=$( echo 'echo $wgDBadminuser;' | "$php" "$multiversionscript"
eval.php aawiki )
+ fi
+ if [ -z "$db_user" ]; then
+ echo "can't get db user name, exiting." >& 2
+ exit 1
+ fi
+ echo $db_user
+}
+
+get_db_pass() {
+ apachedir=$1
+
+ multiversionscript="${apachedir}/multiversion/MWScript.php"
+ if [ -e "$multiversionscript" ]; then
+ db_pass=$( echo 'echo $wgDBadminpassword;' | "$php"
"$multiversionscript" eval.php aawiki )
+ fi
+ if [ -z "$db_pass" ]; then
+ echo "can't get db password, exiting." >& 2
+ exit 1
+ fi
+ echo $db_pass
+}
+
+dump_tables() {
+ tables=$1
+ outputdir=$2
+ mysqldump=$3
+ gzip=$4
+ db_user=$5
+ db_pass=$6
+
+ today=$( date +%Y%m%d )
+ dir="$outputdir/$today"
+ mkdir -p "$dir"
+ for t in $tables; do
+ outputfile="${dir}/${today}-${t}.gz"
+ if [ "$dryrun" == "true" ]; then
+ echo "would run:"
+ echo -n "${mysqldump} -u ${db_user} -p${db_pass} -h ${host} --opt
--quick --skip-add-locks --skip-lock-tables centralauth ${t}"
+ echo "| ${gzip} > ${outputfile}"
+ else
+ # echo "dumping $t into $outputfile"
+ "$mysqldump" -u "$db_user" -p"$db_pass" -h "$host" \
+ --opt --quick --skip-add-locks --skip-lock-tables \
+ centralauth "$t" | "$gzip" > "$outputfile"
+ fi
+ done
+}
+
+usage() {
+ echo "Usage: $0 [--config <pathtofile>] [--dryrun]" >& 2
+ echo >& 2
+ echo " --config path to configuration file for dump generation" >& 2
+ echo " (default value: ${confsdir}/wikidump.conf" >& 2
+ echo " --dryrun don't run dump, show what would have been done" >& 2
+ exit 1
+}
+
+configfile="${confsdir}/wikidump.conf"
+dryrun="false"
+
+while [ $# -gt 0 ]; do
+ if [ $1 == "--config" ]; then
+ configfile="$2"
+ shift; shift
+ elif [ $1 == "--dryrun" ]; then
+ dryrun="true"
+ shift
+ else
+ echo "$0: Unknown option $1" >& 2
+ usage
+ fi
+done
+
+args="wiki:dir;tools:gzip,mysqldump,php"
+results=$( /usr/bin/python "${repodir}/getconfigvals.py" --configfile
"$configfile" --args "$args" )
+
+apachedir=$( getsetting "$results" "wiki" "dir"` ) || exit 1
+gzip=$( getsetting "$results" "tools" "gzip" ) || exit 1
+mysqldump=$( getsetting "$results" "tools" "mysqldump" ) || exit 1
+php=$( getsetting "$results" "tools" "php" ) || exit 1
+
+for settingname in "apachedir" "gzip" "mysqldump"; do
+ checkval "$settingname" "${!settingname}"
+done
+
+outputdir="${otherdumspdir}/globalblocks"
+
+host=`get_db_host "$apachedir"` || exit 1
+db_user=`get_db_user "$apachedir"` || exit 1
+db_pass=`get_db_pass "$apachedir"` || exit 1
+
+dump_tables "globalblocks" "$outputdir" "$mysqldump" "$gzip" "$db_user"
"$db_pass"
diff --git a/modules/dumps/files/otherdumps/weeklies/dumpcategoriesrdf.sh
b/modules/dumps/files/otherdumps/weeklies/dumpcategoriesrdf.sh
new file mode 100755
index 0000000..bad0be3
--- /dev/null
+++ b/modules/dumps/files/otherdumps/weeklies/dumpcategoriesrdf.sh
@@ -0,0 +1,134 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# modules/dumps/otherdumps/weeklies/dumpcategoriesrdf.sh
+#############################################################
+#
+# Generate an RDF dump of categories for all wikis in
+# categories-rdf list and remove old ones.
+
+source /usr/local/etc/dump_functions.sh
+
+usage() {
+ echo "Usage: $0 --list wikis.dblist [--config <pathtofile>] [--dryrun]"
+ echo
+ echo " --config path to configuration file for dump generation"
+ echo " (default value: ${confsdir}/wikidump.conf"
+ echo " --list file containing list of the wikis to dump"
+ echo " --dryrun don't run dump, show what would have been done"
+ exit 1
+}
+
+configfile="${confsdir}/wikidump.conf"
+dryrun="false"
+dumpFormat="ttl"
+dbList="categories-rdf"
+
+while [ $# -gt 0 ]; do
+ if [ $1 == "--config" ]; then
+ configfile="$2"
+ shift; shift;
+ elif [ $1 == "--dryrun" ]; then
+ dryrun="true"
+ shift
+ elif [ $1 == "--list" ]; then
+ dbList="$2"
+ shift; shift;
+ else
+ echo "$0: Unknown option $1"
+ usage
+ fi
+done
+
+if [ -z "$dbList" -o ! -f "$dbList" ]; then
+ echo "Valid wiki list must be specified"
+ echo "Exiting..."
+ exit 1
+fi
+
+if [ ! -f "$configfile" ]; then
+ echo "Could not find config file: $configfile"
+ echo "Exiting..."
+ exit 1
+fi
+
+args="wiki:dir,privatelist;tools:gzip,php;output:public"
+results=$( /usr/bin/python "${repodir}/getconfigvals.py" --configfile
"$configfile" --args "$args" )
+
+deployDir=$( getsetting "$results" "wiki" "dir" ) || exit 1
+privateList=$( getsetting "$results" "wiki" "privatelist" ) || exit 1
+gzip=$( getsetting "$results" "tools" "gzip" ) || exit 1
+php=$( getsetting "$results" "tools" "php" ) || exit 1
+
+for settingname in "deployDir" "gzip" "privateList" "php"; do
+ checkval "$settingname" "${!settingname}"
+done
+
+today=$( /bin/date +'%Y%m%d' )
+targetDirBase="${otherdumpsdir}/categoriesrdf"
+targetDir="${targetDirBase}/${today}"
+timestampsDir="${targetDirBase}/lastdump"
+multiVersionScript="${deployDir}/multiversion/MWScript.php"
+
+# remove old datasets
+daysToKeep=70
+cutOff=$(( $( /bin/date +%s ) - $(( $daysToKeep + 1 )) * 24 * 3600))
+if [ -d "$targetDirBase" ]; then
+ for folder in $(/bin/ls -d -r "${targetDirBase}/"*); do
+ creationTime=$( /bin/date --utc --date="$(basename $folder )"
+%s 2>/dev/null)
+ if [ -n "$creationTime" ]; then
+ if [ "$cutOff" -gt "$creationTime" ]; then
+ if [ "$dryrun" == "true" ]; then
+ echo /bin/rm "${folder}/"*".${dumpFormat}.gz"
+ echo /bin/rmdir "${folder}"
+ else
+ /bin/rm -f "${folder}/"*".${dumpFormat}.gz"
+ /bin/rmdir "${folder}"
+ fi
+ fi
+ fi
+ done
+fi
+
+# create todays folder
+if [ "$dryrun" == "true" ]; then
+ echo /bin/mkdir -p "$targetDir"
+ echo /bin/mkdir -p "$timestampsDir"
+else
+ if ! /bin/mkdir -p "$targetDir"; then
+ echo "Can't make output directory: $targetDir"
+ echo "Exiting..."
+ exit 1
+ fi
+ if ! /bin/mkdir -p "$timestampsDir"; then
+ echo "Can't make output directory: $timestampsDir"
+ echo "Exiting..."
+ exit 1
+ fi
+fi
+
+# iterate over configured wikis
+/bin/cat "$dbList" | while read wiki; do
+ # exclude all private wikis
+ if ! /bin/egrep -q "^${wiki}$" "$privateList"; then
+ filename="${wiki}-${today}-categories"
+ targetFile="${targetDir}/${filename}.${dumpFormat}.gz"
+ tsFile="${timestampsDir}/${wiki}-categories.last"
+ if [ "$dryrun" == "true" ]; then
+ echo "${php} ${multiVersionScript}
maintenance/dumpCategoriesAsRdf.php --wiki=${wiki} --format=${dumpFormat} 2>
/var/log/categoriesrdf/${filename}.log | ${gzip} > ${targetFile}"
+ else
+ "$php" "$multiVersionScript"
maintenance/dumpCategoriesAsRdf.php \
+ "--wiki=${wiki}" \
+ "--format=${dumpFormat}" 2>
"/var/log/categoriesrdf/${filename}.log" \
+ | "$gzip" > "$targetFile"
+ echo "$today" > "$tsFile"
+ fi
+ fi
+done
+
+
+# Maintain a 'latest' symlink always pointing at the most recently completed
dump
+if [ "$dryrun" == "false" ]; then
+ cd "$targetDirBase"
+ /bin/ln -snf "$today" "latest"
+fi
diff --git a/modules/dumps/files/otherdumps/weeklies/dumpcirrussearch.sh
b/modules/dumps/files/otherdumps/weeklies/dumpcirrussearch.sh
new file mode 100644
index 0000000..ba81f9c
--- /dev/null
+++ b/modules/dumps/files/otherdumps/weeklies/dumpcirrussearch.sh
@@ -0,0 +1,147 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# modules/dumps/otherdumps/weeklies/dumpcirrussearch.sh
+#############################################################
+#
+# Generate a json dump of cirrussearch indices for all enabled
+# wikis and remove old ones.
+
+source /usr/local/etc/dump_functions.sh
+
+usage() {
+ echo "Usage: $0 [--config <pathtofile>] [--dryrun]"
+ echo
+ echo " --config path to configuration file for dump generation"
+ echo " (default value: ${confsdir}/wikidump.conf"
+ echo " --dryrun don't run dump, show what would have been done"
+}
+
+configfile="${confsdir}/wikidump.conf"
+dryrun="false"
+
+while [ $# -gt 0 ]; do
+ if [ $1 == "--config" ]; then
+ configfile="$2"
+ shift; shift;
+ elif [ $1 == "--dryrun" ]; then
+ dryrun="true"
+ shift
+ else
+ echo "$0: Unknown option $1"
+ usage
+ fi
+done
+
+if [ ! -f "$configfile" ]; then
+ echo "Could not find config file: $configfile"
+ echo "Exiting..."
+ exit 1
+fi
+
+args="wiki:dir,dblist,privatelist;tools:gzip,php;output:public"
+results=$( /usr/bin/python "${repodir}/getconfigvals.py" --configfile
"$configFile" --args "$args" )
+
+deployDir=`getsetting "$results" "wiki" "dir"` || exit 1
+allList=`getsetting "$results" "wiki" "dblist"` || exit 1
+privateList=`getsetting "$results" "wiki" "privatelist"` || exit 1
+gzip=`getsetting "$results" "tools" "gzip"` || exit 1
+php=`getsetting "$results" "tools" "php"` || exit 1
+
+for settingname in "deployDir" "allList" "privateList" "gzip"; do
+ checkval "$settingname" "${!settingname}"
+done
+
+today=$( /bin/date +'%Y%m%d' )
+targetDirBase="${otherdumpsdir}/cirrussearch"
+targetDir="${targetDirBase}/${today}"
+multiVersionScript="${deployDir}/multiversion/MWScript.php"
+
+# remove old datasets
+daysToKeep=70
+cutOff=$(( $( /bin/date +%s ) - $(( $daysToKeep + 1 )) * 24 * 3600))
+if [ -d "$targetDirBase" ]; then
+ for folder in $(/bin/ls -d -r "${targetDirBase}/"*); do
+ creationTime=$( /bin/date --utc --date="$(basename $folder)"
+%s 2>/dev/null )
+ if [ -n "$creationTime" ]; then
+ if [ "$cutOff" -gt "$creationTime" ]; then
+ if [ "$dryrun" == "true" ]; then
+ echo /bin/rm "${folder}/"*.json.gz
+ echo /bin/rmdir "$folder"
+ else
+ /bin/rm -f "${folder}/"*.json.gz
+ /bin/rmdir "$folder"
+ fi
+ fi
+ fi
+ done
+fi
+
+# create todays folder
+if [ "$dryrun" == "true" ]; then
+ echo /bin/mkdir -p "$targetDir"
+else
+ if ! /bin/mkdir -p "$targetDir"; then
+ echo "Can't make output directory: ${targetDir}"
+ echo "Exiting..."
+ exit 1
+ fi
+fi
+
+# iterate over all known wikis
+cat "$allList" | while read wiki; do
+ # exclude all private wikis
+ if ! /bin/egrep -q "^${wiki}$" "$privateList"; then
+ # most wikis only have two indices
+ types="content general"
+ # commonswiki is special, it also has a file index
+ if [ "$wiki" == "commonswiki" ]; then
+ types="${types} file"
+ fi
+ # run the dump for each index type
+ for type in $types; do
+ filename="${wiki}-${today}-cirrussearch-${type}"
+ targetFile="${targetDir}/${filename}.json.gz"
+ if [ "$dryrun" == "true" ]; then
+ echo "${php} ${multiVersionScript}
extensions/CirrusSearch/maintenance/dumpIndex.php --wiki=${wiki}
--indexType=${type} 2> /var/log/cirrusdump/cirrusdump-${filename}.log | ${gzip}
> ${targetFile}"
+ else
+ "$php" "$multiVersionScript" \
+
extensions/CirrusSearch/maintenance/dumpIndex.php \
+ "--wiki=${wiki}" \
+ "--indexType=${type}" \
+ 2>
"/var/log/cirrusdump/cirrusdump-${filename}.log" \
+ | "$gzip" > "$targetFile"
+ fi
+ done
+ fi
+done
+
+# dump the metastore index (contains persistent states used by cirrus
+# administrative tasks). This index is cluster scoped and not bound to a
+# particular wiki (we pass --wiki to mwscript because it's mandatory but this
+# option is not used by the script itself)
+clusters="eqiad codfw"
+for cluster in $clusters; do
+ filename="cirrus-metastore-${cluster}-${today}"
+ targetFile="${targetDir}/${filename}.json.gz"
+ if [ "$dryrun" == "true" ]; then
+ echo "${php} ${multiVersionScript}
extensions/CirrusSearch/maintenance/metastore.php --wiki=metawiki --dump
--cluster=${cluster} 2>> /var/log/cirrusdump/cirrusdump-${filename}.log |
${gzip} > ${targetFile}"
+ else
+ "$php" "$multiVersionScript" \
+ extensions/CirrusSearch/maintenance/metastore.php \
+ --wiki=metawiki \
+ --dump \
+ "--cluster=${cluster}" \
+ 2>> "/var/log/cirrusdump/cirrusdump-${filename}.log" \
+ | "$gzip" > "$targetFile"
+ fi
+done
+
+
+
+# Maintain a 'current' symlink always pointing at the most recently completed
dump
+if [ "$dryrun" == "false" ]; then
+ cd "$targetDirBase"
+ /bin/rm -f "current"
+ /bin/ln -s "$today" "current"
+fi
diff --git a/modules/dumps/files/otherdumps/weeklies/dumpcontentxlation.sh
b/modules/dumps/files/otherdumps/weeklies/dumpcontentxlation.sh
new file mode 100644
index 0000000..b4ed5da
--- /dev/null
+++ b/modules/dumps/files/otherdumps/weeklies/dumpcontentxlation.sh
@@ -0,0 +1,84 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# modules/dumps/otherdumps/weeklies/dumpcontentxlation.sh
+#############################################################
+
+source /usr/local/etc/dump_functions.sh
+
+do_dump() {
+ format=$1
+ plaintext=$2
+ command="${php} ${multiversionscript} ${xlationscript} --wiki enwiki -q
--split-at 500 --outputdir ${outdir} --compression gzip --format ${format}"
+
+ if [ -n "$plaintext" ]; then
+ command="${command} --plaintext"
+ fi
+
+ if [ "${dryrun}" == "true" ]; then
+ echo "$command"
+ else
+ $command
+ fi
+}
+
+usage() {
+ echo "Usage: $0 [--config <pathtofile>] [--dryrun]"
+ echo
+ echo " --config path to configuration file for dump generation"
+ echo " (default value: ${confsdir}/wikidump.conf"
+ echo " --dryrun display dump command instead of running it"
+ exit 1
+}
+
+#####################
+# MAIN
+#####################
+
+configfile="${confsdir}/wikidump.conf"
+dryrun="false"
+
+#####################
+# Get cmdline args
+#####################
+
+while [ $# -gt 0 ]; do
+ if [ $1 == "--config" ]; then
+ configfile="$2"
+ shift; shift
+ elif [ $1 == "--dryrun" ]; then
+ dryrun="true"
+ shift
+ else
+ echo "$0: Unknown option $1"
+ usage
+ fi
+done
+
+#####################
+# Get config settings
+#####################
+
+args="wiki:dir;tools:php"
+results=$( /usr/bin/python "${repodir}/getconfigvals.py" --configfile
"$configfile" --args "$args" )
+
+apachedir=$( getsetting "$results" "wiki" "dir" ) || exit 1
+php=$( getsetting "$results" "tools" "php" ) || exit 1
+
+for settingname in "apachedir" "php"; do
+ checkval "$settingname" "${!settingname}"
+done
+
+####################
+# Dump
+####################
+
+today=$( /bin/date +%Y%m%d )
+outdir="${otherdumpsdir}/contenttranslation/${today}"
+/bin/mkdir -p "$outdir" || exit 1
+multiversionscript="${apachedir}/multiversion/MWScript.php"
+xlationscript="extensions/ContentTranslation/scripts/dump-corpora.php"
+
+do_dump json
+do_dump json plaintext
+do_dump tmx plaintext
diff --git a/modules/dumps/files/otherdumps/wikidata-weeklies.sh
b/modules/dumps/files/otherdumps/wikidata-weeklies.sh
new file mode 100644
index 0000000..b80428f
--- /dev/null
+++ b/modules/dumps/files/otherdumps/wikidata-weeklies.sh
@@ -0,0 +1,10 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# modules/dumpscrons/wikidata-weeklies.sh
+#############################################################
+
+source /usr/local/bin/dump_functions.sh
+
+#/usr/local/bin/dumpwikidatajson.sh
+#/usr/local/bin/dumpwikidatardf.sh
diff --git a/modules/dumps/files/otherdumps/wikidata/dcatconfig.json
b/modules/dumps/files/otherdumps/wikidata/dcatconfig.json
new file mode 100644
index 0000000..bf6a408
--- /dev/null
+++ b/modules/dumps/files/otherdumps/wikidata/dcatconfig.json
@@ -0,0 +1,54 @@
+{
+ "directory": null,
+ "api-enabled": true,
+ "dumps-enabled": true,
+ "uri": "https://www.wikidata.org/about",
+ "catalog-homepage": "https://www.wikidata.org",
+ "catalog-issued": "2012-10-30",
+ "catalog-license": "http://creativecommons.org/publicdomain/zero/1.0/",
+ "catalog-i18n":
"https://www.wikidata.org/w/index.php?title=MediaWiki:DCAT.json&action=raw",
+ "keywords": ["data store", "semantic", "knowledgebase", "Wikimedia", "user
generated content", "UGC", "Wikipedia", "Wikidata"],
+ "themes": ["1428", "441", "2191", "384", "7374"],
+ "publisher": {
+ "name": "Wikimedia Foundation",
+ "homepage": "http://wikimediafoundation.org/",
+ "email": "[email protected]",
+ "publisherType": "NonProfitOrganisation"
+ },
+ "contactPoint": {
+ "vcardType" : "Organization",
+ "name": "Wikidata information team",
+ "email": "[email protected]"
+ },
+ "ld-info": {
+ "accessURL": "https://www.wikidata.org/entity/",
+ "mediatype": {
+ "json": "application/json",
+ "n3": "application/n-triples",
+ "rdf": "application/rdf+xml",
+ "ttl": "text/turtle",
+ "html": "text/html"
+ },
+ "license": "http://creativecommons.org/publicdomain/zero/1.0/"
+ },
+ "api-info": {
+ "accessURL": "https://www.wikidata.org/w/api.php",
+ "mediatype": {
+ "json": "application/json",
+ "xml": "application/xml"
+ },
+ "license": "http://creativecommons.org/publicdomain/zero/1.0/"
+ },
+ "dump-info": {
+ "accessURL": "https://dumps.wikimedia.org/wikidatawiki/entities/$1",
+ "mediatype": {
+ "json": "application/json",
+ "ttl": "text/turtle"
+ },
+ "compression": {
+ "gzip": "gz",
+ "bzip2": "bz2"
+ },
+ "license": "http://creativecommons.org/publicdomain/zero/1.0/"
+ }
+}
diff --git a/modules/dumps/files/otherdumps/wikidata/dumpwikidatajson.sh
b/modules/dumps/files/otherdumps/wikidata/dumpwikidatajson.sh
new file mode 100644
index 0000000..5df293c
--- /dev/null
+++ b/modules/dumps/files/otherdumps/wikidata/dumpwikidatajson.sh
@@ -0,0 +1,117 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# puppet:///modules/dumps/otherdumps/wikidata/dumpwikidatajson.sh
+#############################################################
+#
+# Generate a json dump for Wikidata and remove old ones.
+#
+# Marius Hoch < [email protected] >
+
+. /usr/local/bin/wikidatadumps-shared.sh
+
+filename="wikidata-${today}-all"
+targetFileGzip="${targetDir}/${filename}.json.gz"
+targetFileBzip2="${targetDir}/${filename}.json.bz2"
+failureFile=/tmp/dumpwikidatajson-failure
+mainLogFile="/var/log/wikidatadump/dumpwikidatajson-${filename}-main.log"
+
+shards=5
+
+# Try to create the dump (up to three times).
+retries=0
+
+while true; do
+ i=0
+ /bin/rm -f "$failureFile"
+
+ while [ $i -lt $shards ]; do
+ (
+ set -o pipefail
+
errorLog="/var/log/wikidatadump/dumpwikidatajson-${filename}-${i}.log"
+ "$php" "$multiversionscript" \
+
extensions/Wikidata/extensions/Wikibase/repo/maintenance/dumpJson.php \
+ --wiki wikidatawiki \
+ --shard "$i" \
+ --sharding-factor "$shards" \
+ --snippet 2>> "$errorLog" \
+ | "$gzip" -9 > "${tempDir}/wikidataJson.${i}.gz"
+ exitCode=$?
+ if [ $exitCode -gt 0 ]; then
+ echo -e "\n\n$( date --iso-8601=minutes ))
Process for shard ${i} failed with exit code ${exitCode}" >> "$errorLog"
+ echo 1 > "$failureFile"
+
+ # Kill all remaining dumpers and start over.
+ kill -- -$$
+ fi
+ ) &
+ let i++
+ done
+
+ wait
+
+ if [ -f "$failureFile" ]; then
+ # Something went wrong, let's clean up and maybe retry. Leave
logs in place.
+ /bin/rm -f "$failureFile"
+ /bin/rm -f "${tempDir}/wikidataJson."*.gz
+ let retries++
+ echo "($( date --iso-8601=minutes )) Dumping one or more shards
failed. Retrying." >> "$mainLogFile"
+
+ if [ $retries -eq 3 ]; then
+ exit 1
+ fi
+
+ # Wait 10 minutes (in case of database problems or other
instabilities), then try again
+ sleep 600
+ continue
+ fi
+
+ break
+
+done
+
+# Open the json list
+echo '[' | "$gzip" -f > "${tempDir}/wikidataJson.gz"
+
+i=0
+while [ $i -lt $shards ]; do
+ tempFile="${tempDir}/wikidataJson.${i}.gz"
+ if [ ! -f "$tempFile" ]; then
+ echo "${tempFile} does not exist. Aborting." >> "$mainLogFile"
+ exit 1
+ fi
+ fileSize=$( stat --printf="%s" "$tempFile" )
+ if [ $fileSize -lt $( expr 10500000000 / $shards ) ]; then
+ echo "File size of ${tempFile} is only ${fileSize}. Aborting."
>> "$mainLogFile"
+ exit 1
+ fi
+ /bin/cat "$tempFile" >> "${tempDir}/wikidataJson.gz"
+ /bin/rm "$tempFile"
+ let i++
+ if [ $i -lt $shards ]; then
+ # Shards don't end with commas so add commas to separate them
+ echo ',' | "$gzip" -f >> "${tempDir}/wikidataJson.gz"
+ fi
+done
+
+# Close the json list
+echo -e '\n]' | "$gzip" -f >> "${tempDir}/wikidataJson.gz"
+
+mv "${tempDir}/wikidataJson.gz" "$targetFileGzip"
+
+# Legacy directory (with legacy naming scheme)
+legacyDirectory="${publicDir}/other/wikidata"
+/bin/ln -s "../wikibase/wikidatawiki/${today}/${filename}.json.gz"
"${legacyDirectory}/${today}.json.gz"
+find "$legacyDirectory" -name '*.json.gz' -mtime + $( expr $daysToKeep + 1 )
-delete
+
+# (Re-)create the link to the latest
+/bin/ln -fs "${today}/${filename}.json.gz"
"${targetDirBase}/latest-all.json.gz"
+
+# Create the bzip2 from the gzip one and update the latest-all.json.bz2 link
+"$gzip" -dc "$targetFileGzip" | "$bzip2" -c > "${tempDir}/wikidataJson.bz2"
+/bin/mv "${tempDir}/wikidataJson.bz2" "$targetFileBzip2"
+/bin/ln -fs "${today}/${filename}.json.bz2"
"${targetDirBase}/latest-all.json.bz2"
+
+pruneOldDirectories
+pruneOldLogs
+runDcat
diff --git a/modules/dumps/files/otherdumps/wikidata/dumpwikidatardf.sh
b/modules/dumps/files/otherdumps/wikidata/dumpwikidatardf.sh
new file mode 100755
index 0000000..bdb94f0
--- /dev/null
+++ b/modules/dumps/files/otherdumps/wikidata/dumpwikidatardf.sh
@@ -0,0 +1,127 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# puppet:///modules/dumps/otherdumps/wikidata/dumpwikidatardf.sh
+#############################################################
+#
+# Generate a json dump for Wikidata and remove old ones.
+#
+# Marius Hoch < [email protected] >
+
+. /usr/local/bin/wikidatadumps-shared.sh
+
+declare -A dumpNameToFlavor
+dumpNameToFlavor=(["all"]="full-dump" ["truthy"]="truthy-dump")
+
+dumpName="$1"
+
+if [ -z "$dumpName" ]; then
+ echo "No dump name given."
+ exit 1
+fi
+
+dumpFlavor=${dumpNameToFlavor["$dumpName"]}
+if [ -z "$dumpFlavor" ]; then
+ echo "Unknown dump name: ${dumpName}"
+ exit 1
+fi
+
+dumpFormat=$2
+
+if [[ "$dumpFormat" != "ttl" ]] && [[ "$dumpFormat" != "nt" ]]; then
+ echo "Unknown format: ${dumpFormat}"
+ exit 1
+fi
+
+filename="wikidata-${today}-${dumpName}-BETA"
+targetFileGzip="${targetDir}/${filename}.${dumpFormat}.gz"
+targetFileBzip2="${targetDir}/${filename}.${dumpFormat}.bz2"
+failureFile="/tmp/dumpwikidata${dumpFormat}-${dumpName}-failure"
+mainLogFile="/var/log/wikidatadump/dumpwikidata${dumpFormat}-${filename}-main.log"
+
+shards=5
+
+declare -A dumpNameToMinSize
+# Sanity check: Minimal size we expect each shard of a certain dump to have
+dumpNameToMinSize=(["all"]=`expr 12500000000 / $shards` ["truthy"]=`expr
7500000000 / $shards`)
+
+# Try to create the dump (up to three times).
+retries=0
+
+while true; do
+ i=0
+ /bin/rm -f "$failureFile"
+
+ while [ $i -lt $shards ]; do
+ (
+ set -o pipefail
+
errorLog="/var/log/wikidatadump/dumpwikidata${dumpFormat}-${filename}-${i}.log"
+ "$php" $multiversionscript
extensions/Wikidata/extensions/Wikibase/repo/maintenance/dumpRdf.php \
+ --wiki wikidatawiki \
+ --shard "$i" \
+ --sharding-factor "$shards" \
+ --format "$dumpFormat" \
+ --flavor "$dumpFlavor" 2>> "$errorLog" \
+ | "$gzip" -9 >
"${tempDir}/wikidata${dumpFormat}-${dumpName}.${i}.gz"
+ exitCode=$?
+ if [ $exitCode -gt 0 ]; then
+ echo -e "\n\n($( date --iso-8601=minutes ))
Process for shard ${i} failed with exit code ${exitCode}" >> "$errorLog"
+ echo 1 > "$failureFile"
+
+ # Kill all remaining dumpers and start over.
+ kill -- -$$
+ fi
+ ) &
+ let i++
+ done
+
+ wait
+
+ if [ -f "$failureFile" ]; then
+ # Something went wrong, let's clean up and maybe retry. Leave
logs in place.
+ /bin/rm -f "$failureFile"
+ /bin/rm -f "${tempDir}/wikidata${dumpFormat}-${dumpName}.*.gz"
+ let retries++
+ echo "($( date --iso-8601=minutes )) Dumping one or more shards
failed. Retrying." >> "$mainLogFile"
+
+ if [ $retries -eq 3 ]; then
+ exit 1
+ fi
+
+ # Wait 10 minutes (in case of database problems or other
instabilities), then try again
+ sleep 600
+ continue
+ fi
+
+ break
+
+done
+
+i=0
+while [ $i -lt $shards ]; do
+ tempFile="${tempDir}/wikidata$dumpFormat-${dumpName}.${i}.gz"
+ if [ ! -f "$tempFile" ]; then
+ echo "${tempFile} does not exist. Aborting." >> "$mainLogFile"
+ exit 1
+ fi
+ fileSize=$( stat --printf="%s" "$tempFile" )
+ if [ $fileSize -lt ${dumpNameToMinSize["$dumpName"]} ]; then
+ echo "File size of ${tempFile} is only ${fileSize}. Aborting."
>> "$mainLogFile"
+ exit 1
+ fi
+ /bin/cat "$tempFile" >>
"${tempDir}/wikidata${dumpFormat}-${dumpName}.gz"
+ /bin/rm "$tempFile"
+ let i++
+done
+
+/bin/mv "${tempDir}/wikidata${dumpFormat}-${dumpName}.gz" "$targetFileGzip"
+/bin/ln -fs "$today/$filename.$dumpFormat.gz"
"$targetDirBase/latest-$dumpName.$dumpFormat.gz"
+
+"$gzip" -dc "$targetFileGzip" | "$bzip2" -c >
"${tempDir}/wikidata${dumpFormat}-${dumpName}.bz2"
+/bin/mv "${tempDir}/wikidata${dumpFormat}-${dumpName}.bz2" "$targetFileBzip2"
+/bin/ln -fs "${today}/${filename}.${dumpFormat}.bz2"
"${targetDirBase}/latest-${dumpName}.${dumpFormat}.bz2"
+
+
+pruneOldDirectories
+pruneOldLogs
+runDcat
diff --git a/modules/dumps/files/otherdumps/wikidata/wikidatadumps-shared.sh
b/modules/dumps/files/otherdumps/wikidata/wikidatadumps-shared.sh
new file mode 100644
index 0000000..848ea2e
--- /dev/null
+++ b/modules/dumps/files/otherdumps/wikidata/wikidatadumps-shared.sh
@@ -0,0 +1,69 @@
+#!/bin/bash
+#############################################################
+# This file is maintained by puppet!
+# modules/dumps/otherdumps/wikidata/wikidatadumps-shared.sh
+#############################################################
+#
+# Shared variable and function declarations for creating Wikidata dumps
+#
+# Marius Hoch < [email protected] >
+
+source /usr/local/etc/dump_functions.sh
+configfile="${confsdir}/wikidump.conf"
+
+today=`date +'%Y%m%d'`
+daysToKeep=70
+
+args="wiki:dir;output:public,temp;tools:php,gzip,bzip2"
+results=`python "${repodir}/getconfigvals.py" --configfile "$configfile"
--args "$args"`
+
+apacheDir=`getsetting "$results" "wiki" "dir"` || exit 1
+publicDir=`getsetting "$results" "output" "public"` || exit 1
+tempDir=`getsetting "$results" "output" "temp"` || exit 1
+php=`getsetting "$results" "tools" "php"` || exit 1
+gzip=`getsetting "$results" "tools" "gzip"` || exit 1
+bzip2=`getsetting "$results" "tools" "bzip2"` || exit 1
+
+for settingname in "apacheDir" "publicDir" "tempDir" "php" "gzip" "bzip2"; do
+ checkval "$settingname" "${!settingname}"
+done
+
+targetDirBase=$publicDir/other/wikibase/wikidatawiki
+targetDir=$targetDirBase/$today
+
+multiversionscript="${apacheDir}/multiversion/MWScript.php"
+
+# Create the dir for the day: This may or may not already exist, we don't care
+mkdir -p $targetDir
+
+# Remove dump-folders we no longer need (keep $daysToKeep days)
+function pruneOldDirectories {
+ # Just to be sure: If this were empty the below would work on /
+ if [ -z "$targetDirBase" ]; then
+ echo "Empty \$targetDirBase"
+ exit 1
+ fi
+
+ cutOff=$(( `date +%s` - `expr $daysToKeep + 1` * 24 * 3600)) #
Timestamp from $daysToKeep + 1 days ago
+ foldersToDelete=`ls -d -r $targetDirBase/*` # $targetDirBase is known
to be non-empty
+ for folder in $foldersToDelete; do
+ # Try to get the unix time from the folder name, if this fails
we'll just
+ # keep the folder (as it's not a valid date, thus hasn't been
created by this script).
+ creationTime=$(date --utc --date="$(basename $folder)" +%s
2>/dev/null)
+ if [ -n "$creationTime" ] && [ "$cutOff" -gt "$creationTime" ];
then
+ /bin/rm -rf $folder
+ fi
+ done
+}
+
+function pruneOldLogs {
+ # Remove old logs (keep 35 days)
+ find /var/log/wikidatadump/ -name 'dumpwikidata*-*-*.log' -mtime +36
-delete
+}
+
+function runDcat {
+ "$php" /usr/local/share/dcat/DCAT.php \
+ --config=/usr/local/etc/dcatconfig.json \
+ "--dumpDir=${targetDirBase}" \
+ "--outputDir=${targetDirBase}"
+}
diff --git a/modules/dumps/files/favicon.ico
b/modules/dumps/files/web/xmldumps/favicon.ico
similarity index 100%
rename from modules/dumps/files/favicon.ico
rename to modules/dumps/files/web/xmldumps/favicon.ico
Binary files differ
diff --git a/modules/dumps/files/logrotate.conf
b/modules/dumps/files/web/xmldumps/logrotate.conf
similarity index 86%
rename from modules/dumps/files/logrotate.conf
rename to modules/dumps/files/web/xmldumps/logrotate.conf
index 4ec0848..7f185aa 100644
--- a/modules/dumps/files/logrotate.conf
+++ b/modules/dumps/files/web/xmldumps/logrotate.conf
@@ -1,5 +1,6 @@
# logrotate config for datasets web logs
# This file is managed by Puppet
+# modules/dumps/web/xmldumps/logrotate.conf
/var/log/nginx/*.log
{
diff --git a/modules/dumps/manifests/addchangesdumps/README.txt
b/modules/dumps/manifests/addchangesdumps/README.txt
new file mode 100644
index 0000000..880ea49
--- /dev/null
+++ b/modules/dumps/manifests/addchangesdumps/README.txt
@@ -0,0 +1,2 @@
+placeholder, the relevant pieces of the snapshot and datasets modules
+will end up here.
diff --git a/modules/dumps/manifests/generation/client/nfs.pp
b/modules/dumps/manifests/generation/client/nfs.pp
new file mode 100644
index 0000000..19e4280
--- /dev/null
+++ b/modules/dumps/manifests/generation/client/nfs.pp
@@ -0,0 +1,22 @@
+class dumps::generation::client::nfs {
+ require_package('nfs-common')
+
+ file { [ '/mnt/dumpsdata' ]:
+ ensure => 'directory',
+ }
+
+ $dumpsdataserver = $::site ? {
+ 'eqiad' => 'dumpsdata1001.eqiad.wmnet',
+ default => 'dumpsdata1001.eqiad.wmnet',
+ }
+
+ mount { '/mnt/data':
+ ensure => 'mounted',
+ device => "${dumpsdataserver}:/data",
+ fstype => 'nfs',
+ name => '/mnt/dumpsdata',
+ options => 'bg,hard,tcp,rsize=8192,wsize=8192,intr,nfsvers=3',
+ require => File['/mnt/dumpsdata'],
+ remounts => false,
+ }
+}
diff --git a/modules/dumpsnfs/manifests/init.pp
b/modules/dumps/manifests/generation/server/nfs.pp
similarity index 84%
rename from modules/dumpsnfs/manifests/init.pp
rename to modules/dumps/manifests/generation/server/nfs.pp
index 45a2afd..19ef17b 100644
--- a/modules/dumpsnfs/manifests/init.pp
+++ b/modules/dumps/manifests/generation/server/nfs.pp
@@ -10,7 +10,7 @@
mode => '0444',
owner => 'root',
group => 'root',
- content => template('dumpsnfs/nfs_exports.erb'),
+ content => template('dumps/nfs/nfs_exports.erb'),
require => Package['nfs-kernel-server'],
}
@@ -29,7 +29,7 @@
mode => '0444',
owner => 'root',
group => 'root',
- content => template('dumpsnfs/default-nfs-common.erb'),
+ content => template('dumps/nfs/default-nfs-common.erb'),
require => Package['nfs-kernel-server'],
}
@@ -37,7 +37,7 @@
mode => '0444',
owner => 'root',
group => 'root',
- content => template('dumpsnfs/default-nfs-kernel-server.erb'),
+ content => template('dumps/nfs/default-nfs-kernel-server.erb'),
require => Package['nfs-kernel-server'],
}
diff --git a/modules/dumps/manifests/otherdumps.pp
b/modules/dumps/manifests/otherdumps.pp
new file mode 100644
index 0000000..cd9ebeb
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps.pp
@@ -0,0 +1,37 @@
+class dumps::otherdumps {
+ # fixme only vars needed should get passed
+ # fixme put requires in where needed
+
+ class {'::dumps::otherdumps::config':
+ user => $user,
+ confsdir => $confsdir,
+ repodir => $repodir,
+ otherdumpsdir => $otherdumpsdir,
+ apachedir => $apachedir,
+ dumpdatadir => $dumpdatadir,
+ }
+ class {'::dumps::otherdumps::daily':
+ user => $user,
+ confsdir => $confsdir,
+ repodir => $repodir,
+ otherdumpsdir => $otherdumpsdir,
+ apachedir => $apachedir,
+ dumpdatadir => $dumpdatadir,
+ }
+ class {'::dumpscrons::otherdumps::weekly':
+ user => $user,
+ confsdir => $confsdir,
+ repodir => $repodir,
+ otherdumpsdir => $otherdumpsdir,
+ apachedir => $apachedir,
+ dumpdatadir => $dumpdatadir,
+ }
+ class {'::dumps::otherdumps::wikidata':
+ user => $user,
+ confsdir => $confsdir,
+ repodir => $repodir,
+ otherdumpsdir => $otherdumpsdir,
+ apachedir => $apachedir,
+ dumpdatadir => $dumpdatadir,
+ }
+}
diff --git a/modules/dumps/manifests/otherdumps/common.pp
b/modules/dumps/manifests/otherdumps/common.pp
new file mode 100644
index 0000000..9fd4322
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/common.pp
@@ -0,0 +1,10 @@
+class dumps::otherdumps::common {
+ file { '/usr/local/etc/dump_functions.sh':
+ ensure => 'present',
+ path => '/usr/local/etc/dump_functions.sh',
+ mode => '0755',
+ owner => 'root',
+ group => 'root',
+ source => 'puppet:///modules/dumps/otherdumps/dump_functions.sh',
+ }
+}
diff --git a/modules/dumps/manifests/otherdumps/config.pp
b/modules/dumps/manifests/otherdumps/config.pp
new file mode 100644
index 0000000..cb000c8
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/config.pp
@@ -0,0 +1,21 @@
+class dumps::otherdumps::config (
+ $confsdir = undef,
+ $dumpdatadir = undef,
+ $apachedir = undef,
+) {
+ file { "${confsdir}":
+ ensure => 'directory',
+ path => "${confsdir}/otherdumps.conf",
+ mode => '0755',
+ owner => 'root',
+ group => 'root',
+ }
+ file { "${confsdir}/otherdumps.conf":
+ ensure => 'present',
+ path => "${confsdir}/otherdumps.conf",
+ mode => '0755',
+ owner => 'root',
+ group => 'root',
+ content => template('dumps/otherdumps/otherdumps.conf.erb'),
+ }
+}
diff --git a/modules/dumps/manifests/otherdumps/daily.pp
b/modules/dumps/manifests/otherdumps/daily.pp
new file mode 100644
index 0000000..0951f3f
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/daily.pp
@@ -0,0 +1,29 @@
+class dumps::otherdumps::daily (
+ $user = undef,
+ $confsdir = undef,
+ $repodir = undef,
+ $otherdumpsdir = undef,
+) {
+ include dumps:otherdumps::daily::mediaperprojectlists
+
+ file { '/usr/local/bin/otherdumps-dailies.sh':
+ ensure => 'present',
+ path => '/usr/local/bin/otherdumps-dailies.sh',
+ mode => '0755',
+ owner => 'root',
+ group => 'root',
+ source => 'puppet:///modules/dumps/otherdumps/dailies.sh',
+ }
+
+ cron { 'otherdumps-dailies':
+ ensure => 'present',
+ environment => '[email protected]',
+ user => $user,
+ command => "/usr/local/bin/otherdumps-dailies.sh --confsdir
$confsdir --repodir $repodir --otherdumpsdir $otherdumpsdir",
+ minute => '10',
+ hour => '6',
+ weekday => '0',
+ require => File['/usr/local/bin/otherdumps-dailies.sh'],
+ }
+
+}
diff --git a/modules/dumps/manifests/otherdumps/daily/mediatitles.pp
b/modules/dumps/manifests/otherdumps/daily/mediatitles.pp
new file mode 100644
index 0000000..3db5908
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/daily/mediatitles.pp
@@ -0,0 +1,12 @@
+class dumps::otherdumps::daily::mediaperprojectlists(
+ $user = undef,
+) {
+ file { '/usr/local/bin/create-media-per-project-lists.sh':
+ ensure => 'present',
+ path => '/usr/local/bin/create-media-per-project-lists.sh',
+ mode => '0755',
+ owner => 'root',
+ group => 'root',
+ source =>
'puppet:///modules/dumps/otherdumps/dailies/create-media-per-project-lists.sh',
+ }
+}
diff --git a/modules/dumps/manifests/otherdumps/weekly.pp
b/modules/dumps/manifests/otherdumps/weekly.pp
new file mode 100644
index 0000000..427f07c
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/weekly.pp
@@ -0,0 +1,48 @@
+class dumps::otherdumps::weekly(
+ $user = undef,
+ $confsdir = undef,
+ $repodir = undef,
+ $otherdumpsdir = undef,
+) {
+ class {'::dumps::otherdumps::weekly::categoriesrdf':
+ user => $user,
+ }
+ class {'::dumps::otherdumps::weekly::cirrussearch':
+ user => $user,
+ }
+ class {'::dumps::otherdumps::weekly::contentxlation':
+ user => $user,
+ }
+ class {'::dumps::otherumps::weekly::globalblocks':
+ user => $user,
+ }
+ class {'::dumps::otherdumps::weekly::mediaperprojectlists':
+ user => $user,
+ }
+
+ file { '/usr/local/bin/otherdumps-weeklies.sh':
+ ensure => 'present',
+ path => '/usr/local/bin/otherdumps-weeklies.sh',
+ mode => '0755',
+ owner => 'root',
+ group => 'root',
+ source => 'puppet:///modules/dumps/otherdumps/otherdumps-weeklies.sh',
+ require => Class[':dumpscrons::weekly::categoriesrdf',
+ '::dumpscrons::weekly::cirrussearch',
+ '::dumpscrons::weekly::contentxlation',
+ '::dumpscrons::weekly::globalblocks',
+ '::dumpscrons::weekly::mediaperprojectlists'],
+ }
+
+ cron { 'otherdumps-weeklies':
+ ensure => 'present',
+ environment => '[email protected]',
+ user => $user,
+ command => '/usr/local/bin/otherdumps-weeklies.sh --confsdir
$confsdir --repodir $repodir --otherdumpsdir $otherdumpsdir',
+ minute => '10',
+ hour => '6',
+ weekday => '0',
+ require => File['/usr/local/bin/otherdumps-weeklies.sh'],
+ }
+
+}
diff --git a/modules/dumps/manifests/otherdumps/weekly/categoriesrdf.pp
b/modules/dumps/manifests/otherdumps/weekly/categoriesrdf.pp
new file mode 100644
index 0000000..9e0c0f5
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/weekly/categoriesrdf.pp
@@ -0,0 +1,21 @@
+class dumps::otherdumps::weekly::categoriesrdf (
+ $user = undef,
+) {
+ file { '/var/log/categoriesrdf':
+ ensure => 'directory',
+ mode => '0644',
+ owner => $user,
+ }
+
+ logrotate::conf { 'categoriesrdf':
+ ensure => present,
+ source =>
'puppet:///modules/dumps/otherdumps/logrot/logrotate.categoriesrdf',
+ }
+
+ file { '/usr/local/bin/dumpcategoriesrdf.sh':
+ mode => '0755',
+ owner => 'root',
+ group => 'root',
+ source =>
'puppet:///modules/dumps/otherdumps/weeklies/dumpcategoriesrdf.sh',
+ }
+}
diff --git a/modules/dumps/manifests/otherdumps/weekly/cirrussearch.pp
b/modules/dumps/manifests/otherdumps/weekly/cirrussearch.pp
new file mode 100644
index 0000000..253210e
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/weekly/cirrussearch.pp
@@ -0,0 +1,21 @@
+class dumps::otherdumps::weekly::cirrussearch(
+ $user = undef,
+) {
+ file { '/var/log/cirrusdump':
+ ensure => 'directory',
+ mode => '0644',
+ owner => $user,
+ }
+
+ logrotate::conf { 'cirrusdump':
+ ensure => present,
+ source =>
'puppet:///modules/dumps/otherdumps/logrot/logrotate.cirrusdump',
+ }
+
+ file { '/usr/local/bin/dumpcirrussearch.sh':
+ mode => '0755',
+ owner => 'root',
+ group => 'root',
+ source =>
'puppet:///modules/dumps/otherdumps/weeklies/dumpcirrussearch.sh',
+ }
+}
diff --git a/modules/dumps/manifests/otherdumps/weekly/contentxlation.pp
b/modules/dumps/manifests/otherdumps/weekly/contentxlation.pp
new file mode 100644
index 0000000..2f6c6f5
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/weekly/contentxlation.pp
@@ -0,0 +1,10 @@
+class dumps::otherdumps::weekly::contentxlation(
+ $user = undef,
+) {
+ file { '/usr/local/bin/dumpcontentxlation.sh':
+ mode => '0755',
+ owner => 'root',
+ group => 'root',
+ source =>
'puppet:///modules/dumps/otherdumps/weeklies/dumpcontentxlation.sh',
+ }
+}
diff --git a/modules/dumps/manifests/otherdumps/weekly/globalblocks.pp
b/modules/dumps/manifests/otherdumps/weekly/globalblocks.pp
new file mode 100644
index 0000000..d7fbcc9
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/weekly/globalblocks.pp
@@ -0,0 +1,10 @@
+class dumps::otherdumps::weekly::globalblocks(
+ $user = undef,
+) {
+ file { '/usr/local/bin/dump-global-blocks.sh':
+ mode => '0755',
+ owner => 'root',
+ group => 'root',
+ source =>
'puppet:///modules/dumps/otherdumps/weeklies/dump-global-blocks.sh',
+ }
+}
diff --git a/modules/dumps/manifests/otherdumps/weekly/mediaperprojectlists.pp
b/modules/dumps/manifests/otherdumps/weekly/mediaperprojectlists.pp
new file mode 100644
index 0000000..178c7c9
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/weekly/mediaperprojectlists.pp
@@ -0,0 +1,12 @@
+class dumps::otherdumps::weekly::mediaperprojectlists(
+ $user = undef,
+) {
+ file { '/usr/local/bin/create-media-per-project-lists.sh':
+ ensure => 'present',
+ path => '/usr/local/bin/create-media-per-project-lists.sh',
+ mode => '0755',
+ owner => 'root',
+ group => 'root',
+ source =>
'puppet:///modules/dumps/otherdumps/weeklies/create-media-per-project-lists.sh',
+ }
+}
diff --git a/modules/dumps/manifests/otherdumps/wikidata.pp
b/modules/dumps/manifests/otherdumps/wikidata.pp
new file mode 100644
index 0000000..fb9d16d
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/wikidata.pp
@@ -0,0 +1,40 @@
+class dumps::otherdumps::wikidata(
+ $user = undef,
+ $confsdir = undef,
+ $repodir = undef,
+ $otherdumpsdir = undef,
+) {
+ class {'::dumps::otherdumps::wikidata::common':
+ user => $user,
+ }
+ class {'::dumps::otherdumps::wikidata::json':
+ user => $user,
+ }
+ class {'::dumps::otherdumps::wikidata::rdf':
+ user => $user,
+ }
+
+ file { '/usr/local/bin/wikidata-weeklies.sh':
+ ensure => 'present',
+ path => '/usr/local/bin/wikidata-weeklies.sh',
+ mode => '0755',
+ owner => 'root',
+ group => 'root',
+ source => 'puppet:///modules/dumps/otherdumps/wikidata-weeklies.sh',
+ require => Class[':dumps::otherdumps::wikidata::common',
+ '::dumps::otherdumps::wikidata::json',
+ '::dumps::otherdumps::wikidata::rdf'],
+ }
+
+ cron { 'otherdumps-weeklies':
+ ensure => 'present',
+ environment => '[email protected]',
+ user => $user,
+ command => "/usr/local/bin/wikidata-weeklies.sh --confsdir
$confsdir --repodir $repodir --otherdumpsdir $otherdumpsdir",
+ minute => '10',
+ hour => '6',
+ weekday => '0',
+ require => File['/usr/local/bin/wikidata-weeklies.sh'],
+ }
+
+}
diff --git a/modules/dumps/manifests/otherdumps/wikidata/common.pp
b/modules/dumps/manifests/otherdumps/wikidata/common.pp
new file mode 100644
index 0000000..172923d
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/wikidata/common.pp
@@ -0,0 +1,31 @@
+class dumps::otherdumps::wikidata::common {
+ file { '/usr/local/bin/wikidatadumps-shared.sh':
+ mode => '0755',
+ owner => 'root',
+ group => 'root',
+ source =>
'puppet:///modules/dumps/otherdumps/wikidata/wikidatadumps-shared.sh',
+ }
+
+ file { '/var/log/wikidatadump':
+ ensure => 'directory',
+ mode => '0755',
+ owner => $user,
+ group => 'www-data',
+ }
+
+ git::clone { 'DCAT-AP':
+ ensure => 'present', # Don't automatically update.
+ directory => '/usr/local/share/dcat',
+ origin => 'https://gerrit.wikimedia.org/r/operations/dumps/dcat',
+ branch => 'master',
+ owner => $user,
+ group => 'www-data',
+ }
+
+ file { '/usr/local/etc/dcatconfig.json':
+ mode => '0644',
+ owner => 'root',
+ group => 'root',
+ source =>
'puppet:///modules/dumps/otherdumps/wikidata/dcatconfig.json',
+ }
+}
diff --git a/modules/dumps/manifests/otherdumps/wikidata/json.pp
b/modules/dumps/manifests/otherdumps/wikidata/json.pp
new file mode 100644
index 0000000..5ed7604
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/wikidata/json.pp
@@ -0,0 +1,14 @@
+class dumps::otherdumps::wikidata::json(
+ $user = undef,
+) {
+ # nope, requires the user param. ugh
+ include ::dumps::otherdumps::wikidata::common
+
+ file { '/usr/local/bin/dumpwikidatajson.sh':
+ mode => '0755',
+ owner => 'root',
+ group => 'root',
+ source =>
'puppet:///modules/dumps/otherdumps/wikidata/dumpwikidatajson.sh',
+ require => Class['dumpscrons::wikidata::common'],
+ }
+}
diff --git a/modules/dumps/manifests/otherdumps/wikidata/rdf.pp
b/modules/dumps/manifests/otherdumps/wikidata/rdf.pp
new file mode 100644
index 0000000..9bfa238
--- /dev/null
+++ b/modules/dumps/manifests/otherdumps/wikidata/rdf.pp
@@ -0,0 +1,14 @@
+class dumps::otherdumps::wikidata::rdf(
+ $user = undef,
+) {
+ # nope needs 'user' param. ugh
+ include ::dumpscrons::wikidata::common
+
+ file { '/usr/local/bin/dumpwikidatardf.sh':
+ mode => '0755',
+ owner => 'root',
+ group => 'root',
+ source =>
'puppet:///modules/dumps/otherdumps/wikidata/dumpwikidatardf.sh',
+ require => Class['dumps::otherdumps::wikidata::::common'],
+ }
+}
diff --git a/modules/dumps/manifests/init.pp
b/modules/dumps/manifests/web/xmldumps.pp
similarity index 100%
rename from modules/dumps/manifests/init.pp
rename to modules/dumps/manifests/web/xmldumps.pp
diff --git a/modules/dumps/manifests/zim.pp b/modules/dumps/manifests/web/zim.pp
similarity index 100%
rename from modules/dumps/manifests/zim.pp
rename to modules/dumps/manifests/web/zim.pp
diff --git a/modules/dumps/manifests/xmldumps/README.txt
b/modules/dumps/manifests/xmldumps/README.txt
new file mode 100644
index 0000000..880ea49
--- /dev/null
+++ b/modules/dumps/manifests/xmldumps/README.txt
@@ -0,0 +1,2 @@
+placeholder, the relevant pieces of the snapshot and datasets modules
+will end up here.
diff --git a/modules/dumpsdirs/manifests/init.pp b/modules/dumps/nfs/dirs.pp
similarity index 100%
rename from modules/dumpsdirs/manifests/init.pp
rename to modules/dumps/nfs/dirs.pp
diff --git a/modules/dumpsnfs/templates/default-nfs-common.erb
b/modules/dumps/templates/nfs/default-nfs-common.erb
similarity index 100%
rename from modules/dumpsnfs/templates/default-nfs-common.erb
rename to modules/dumps/templates/nfs/default-nfs-common.erb
diff --git a/modules/dumpsnfs/templates/default-nfs-kernel-server.erb
b/modules/dumps/templates/nfs/default-nfs-kernel-server.erb
similarity index 100%
rename from modules/dumpsnfs/templates/default-nfs-kernel-server.erb
rename to modules/dumps/templates/nfs/default-nfs-kernel-server.erb
diff --git a/modules/dumpsnfs/templates/nfs_exports.erb
b/modules/dumps/templates/nfs/nfs_exports.erb
similarity index 100%
rename from modules/dumpsnfs/templates/nfs_exports.erb
rename to modules/dumps/templates/nfs/nfs_exports.erb
diff --git a/modules/dumps/templates/otherdumps/otherdumps.conf.erb
b/modules/dumps/templates/otherdumps/otherdumps.conf.erb
new file mode 100644
index 0000000..582589c
--- /dev/null
+++ b/modules/dumps/templates/otherdumps/otherdumps.conf.erb
@@ -0,0 +1,22 @@
+#############################################################
+# This file is maintained by puppet!
+# modules/dumspcrons/otherdumps.conf.erb
+#############################################################
+
+[wiki]
+dblist=<%= @apachedir -%>/dblists/all.dblist
+privatelist=<%= @apachedir -%>/dblists/private.dblist
+dir=<%= @apachedir -%>
+
+[output]
+public=<%= @dumpdatadir -%>/public
+temp=<%= @dumpdatadir -%>/temp
+
+[tools]
+php=/usr/bin/php5
+#php=/usr/bin/php
+mysql=/usr/bin/mysql
+mysqldump=/usr/bin/mysqldump
+gzip=/bin/gzip
+bzip2=/bin/bzip2
+sevenzip=/usr/bin/7za
--
To view, visit https://gerrit.wikimedia.org/r/377231
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib3ed35e886d330e6d9112791b84e44d493ffaaea
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: ArielGlenn <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits