Stefan.petrea has submitted this change and it was merged.

Change subject: Fixing problem with cronjob [IN PROGRESS]
......................................................................


Fixing problem with cronjob [IN PROGRESS]

  * added more error-checks and validity checks in Sequential.pm to
    avoid warnings in .err files (they appeared in
                                  /tmp/pageviews-full-cron/ )
  * added more docs to clear up what the cronjob does(also separation of
    concerns cleared up from that point of view
  * moved and renamed files depending on the environment
        - local-cron-*
        - stat1-cron-*

Change-Id: I99f085722e85b576e931c23bd0df2c118db98005
---
R pageviews_reports/bin/generate-docs
A pageviews_reports/bin/local-cron-install.sh
C pageviews_reports/bin/local-cron-script.sh
R pageviews_reports/bin/pageviews.pl
R pageviews_reports/bin/stat1-cron-install.sh
R pageviews_reports/bin/stat1-cron-script.sh
R pageviews_reports/bin/syntax-check
R pageviews_reports/conf/local-full-cron.json
M pageviews_reports/conf/local-restricted.json
M pageviews_reports/conf/stat1-full-cron.json
M pageviews_reports/lib/PageViews/Model/Sequential.pm
D pageviews_reports/lib/PageViews/View.pm
M pageviews_reports/overview.pod
13 files changed, 59 insertions(+), 56 deletions(-)

Approvals:
  Stefan.petrea: Verified; Looks good to me, approved
  jenkins-bot: Verified



diff --git a/pageviews_reports/generate-docs 
b/pageviews_reports/bin/generate-docs
similarity index 100%
rename from pageviews_reports/generate-docs
rename to pageviews_reports/bin/generate-docs
diff --git a/pageviews_reports/bin/local-cron-install.sh 
b/pageviews_reports/bin/local-cron-install.sh
new file mode 100755
index 0000000..4088893
--- /dev/null
+++ b/pageviews_reports/bin/local-cron-install.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+# Since logs are written to disk on stat1's time at around 06:30
+# we place this job at 07:20 to be sure the needed files are in place
+USER=`whoami`
+CRON_SCRIPT=/a/wikistats_git/pageviews_reports/bin/local-cron-script.sh
+crontab  -l | { cat; echo  "22 * * * * $USER . /home/$USER/.bashrc; /bin/bash 
$CRON_SCRIPT"; } | crontab -
diff --git a/pageviews_reports/cron-script.sh 
b/pageviews_reports/bin/local-cron-script.sh
similarity index 79%
copy from pageviews_reports/cron-script.sh
copy to pageviews_reports/bin/local-cron-script.sh
index 8955e3a..8b79fb0 100755
--- a/pageviews_reports/cron-script.sh
+++ b/pageviews_reports/bin/local-cron-script.sh
@@ -15,7 +15,7 @@
 # Clean up mappers output from previous run
 rm -f $OUTPUT_DIR/map/*;
 /usr/bin/env perl -I$MOBILE_PAGEVIEWS_DIR/lib            \
-                    $MOBILE_PAGEVIEWS_DIR/pageviews.pl   \
-                    $MOBILE_PAGEVIEWS_DIR/conf/stat1-full-cron.json 2>&1 
>/tmp/cperlerr;
+                    $MOBILE_PAGEVIEWS_DIR/bin/pageviews.pl   \
+                    $MOBILE_PAGEVIEWS_DIR/conf/local-restricted.json 2>&1 
>/tmp/cperlerr;
 cp $OUTPUT_DIR/PageViewsPerMonthAll.csv \
    $WIKISTATS_DIR/dumps/csv/csv_sp/ ;
diff --git a/pageviews_reports/pageviews.pl b/pageviews_reports/bin/pageviews.pl
similarity index 91%
rename from pageviews_reports/pageviews.pl
rename to pageviews_reports/bin/pageviews.pl
index 2830294..2dbbc5f 100755
--- a/pageviews_reports/pageviews.pl
+++ b/pageviews_reports/bin/pageviews.pl
@@ -56,8 +56,13 @@
 my $model;
 my $view ;
 
+# If the $config->{end}->{custom} is "previous-month"
+#
+# then identify the get the current time, chop of days until the previous 
month is reached
+#
+# and then use that as end month for processing
 
-if($config->{end}->{custom} eq "previous-month") {
+if(defined($config->{end}->{custom}) && $config->{end}->{custom}eq 
"previous-month") {
   my $c = localtime;
   my $p = $c;
   while($c->mon == $p->mon){
diff --git a/pageviews_reports/cron-install.sh 
b/pageviews_reports/bin/stat1-cron-install.sh
similarity index 78%
rename from pageviews_reports/cron-install.sh
rename to pageviews_reports/bin/stat1-cron-install.sh
index a086fa5..5dba1bb 100755
--- a/pageviews_reports/cron-install.sh
+++ b/pageviews_reports/bin/stat1-cron-install.sh
@@ -2,5 +2,5 @@
 # Since logs are written to disk on stat1's time at around 06:30
 # we place this job at 07:20 to be sure the needed files are in place
 USER=`whoami`
-CRON_SCRIPT=/a/wikistats_git/pageviews_reports/cron-script.sh
+CRON_SCRIPT=/a/wikistats_git/pageviews_reports/bin/stat1-cron-script.sh
 crontab  -l | { cat; echo  "20 7 01 * * $USER . /home/$USER/.bashrc; /bin/bash 
$CRON_SCRIPT"; } | crontab -
diff --git a/pageviews_reports/cron-script.sh 
b/pageviews_reports/bin/stat1-cron-script.sh
similarity index 83%
rename from pageviews_reports/cron-script.sh
rename to pageviews_reports/bin/stat1-cron-script.sh
index 8955e3a..7e886f7 100755
--- a/pageviews_reports/cron-script.sh
+++ b/pageviews_reports/bin/stat1-cron-script.sh
@@ -14,8 +14,8 @@
 /bin/date            >> /tmp/cperlver;
 # Clean up mappers output from previous run
 rm -f $OUTPUT_DIR/map/*;
-/usr/bin/env perl -I$MOBILE_PAGEVIEWS_DIR/lib            \
-                    $MOBILE_PAGEVIEWS_DIR/pageviews.pl   \
+/usr/bin/env perl -I$MOBILE_PAGEVIEWS_DIR/lib                \
+                    $MOBILE_PAGEVIEWS_DIR/bin/pageviews.pl   \
                     $MOBILE_PAGEVIEWS_DIR/conf/stat1-full-cron.json 2>&1 
>/tmp/cperlerr;
 cp $OUTPUT_DIR/PageViewsPerMonthAll.csv \
    $WIKISTATS_DIR/dumps/csv/csv_sp/ ;
diff --git a/pageviews_reports/syntax-check b/pageviews_reports/bin/syntax-check
similarity index 100%
rename from pageviews_reports/syntax-check
rename to pageviews_reports/bin/syntax-check
diff --git a/pageviews_reports/conf/local.json 
b/pageviews_reports/conf/local-full-cron.json
similarity index 100%
rename from pageviews_reports/conf/local.json
rename to pageviews_reports/conf/local-full-cron.json
diff --git a/pageviews_reports/conf/local-restricted.json 
b/pageviews_reports/conf/local-restricted.json
index b52c9af..851cc3c 100644
--- a/pageviews_reports/conf/local-restricted.json
+++ b/pageviews_reports/conf/local-restricted.json
@@ -2,13 +2,13 @@
   "model"                : "parallel",
   "max-children"         : 8,
   "input-path"           : "/home/user/wikidata/raw_gzips",
-  "children-output-path" : "/tmp/pageviews/map",
-  "output-path"          : "/tmp/pageviews",
-  "output-formats"       : ["web","json","wikireport"],
+  "children-output-path" : "/tmp/pageviews-full-cron/map",
+  "output-path"          : "/tmp/pageviews-full-cron",
+  "output-formats"       : ["json","wikireport"],
   "logs-prefix"          : "sampled",
   "restrictions"         : {
     "days-of-each-month"   : [2,3],
-    "lines-for-each-day"   : 100000
+    "lines-for-each-day"   : 20000
   },
   "start"    : {
     "year"   : 2012,
diff --git a/pageviews_reports/conf/stat1-full-cron.json 
b/pageviews_reports/conf/stat1-full-cron.json
index 7749671..3bbe0cb 100644
--- a/pageviews_reports/conf/stat1-full-cron.json
+++ b/pageviews_reports/conf/stat1-full-cron.json
@@ -4,7 +4,7 @@
   "input-path"           : "/a/squid/archive/sampled",
   "children-output-path" : "/tmp/pageviews-full-cron/map",
   "output-path"          : "/tmp/pageviews-full-cron",
-  "output-formats"       : ["json","web","wikireport"],
+  "output-formats"       : ["json","wikireport"],
   "logs-prefix"          : "sampled",
   "start"    : {
     "year"   : 2012,
diff --git a/pageviews_reports/lib/PageViews/Model/Sequential.pm 
b/pageviews_reports/lib/PageViews/Model/Sequential.pm
index 15669bf..d733b23 100644
--- a/pageviews_reports/lib/PageViews/Model/Sequential.pm
+++ b/pageviews_reports/lib/PageViews/Model/Sequential.pm
@@ -236,7 +236,9 @@
     }elsif( index($path_fragment,"w/index.php" ,0)!=-1) {
       $retval->{"pageview-type"} = "wiki_index";
     }elsif( index($path_fragment,"w/api.php"   ,0)!=-1) {
-      my $url_params    = { split(/&|=/,$c[8]) };
+      my @kv           = split(/&|=/,$c[8]);
+      return undef if(~~@kv % 2 == 1);
+      my $url_params    = { @kv };
       $retval->{"pageview-type"} = "api";
       $retval->{action}  = $url_params->{action};
       $retval->{"title"} = $url_params->{page}  || 
@@ -350,7 +352,7 @@
   my ($self,$mime_type) = @_;
   ## text/html mime types only
   ## (mimetype filtering only occurs for regular pageviews, not for the API 
ones) 
-  if( $mime_type =~ m{text/html|text/vnd\.wap\.wml|application/json}i ) {
+  if(defined($mime_type) && $mime_type =~ 
m{text/html|text/vnd\.wap\.wml|application/json}i ) {
     return 1;
   };
   $self->{counts_discarded_mimetype}->{$self->{last_ymd}}++;
diff --git a/pageviews_reports/lib/PageViews/View.pm 
b/pageviews_reports/lib/PageViews/View.pm
deleted file mode 100644
index e2fc3c9..0000000
--- a/pageviews_reports/lib/PageViews/View.pm
+++ /dev/null
@@ -1,42 +0,0 @@
-package PageViews::View;
-use strict;
-use warnings;
-use Template;
-use Carp;
-
-sub new {
-  my ($class,$data) = @_;
-  my $raw_obj = {
-    data => $data,
-  };
-  my $obj     = bless $raw_obj,$class;
-  return $obj;
-};
-
-sub render {
-  my ($self,$params) = @_;
-
-  confess "[ERR] expected param output_path"
-    unless exists $params->{output_path};
-  confess "[ERR] output_path doesn't exist on disk"
-    unless     -d $params->{output_path};
-
-  my $output_path = $params->{output_path};
-
-  `mkdir -p $output_path`;
-  my $tt = Template->new({
-      INCLUDE_PATH => "./templates",
-      OUTPUT_PATH  => $output_path,
-      DEBUG        => 1,
-  }); 
-  $tt->process(
-    "pageviews.tt",
-    $self->{data} ,
-    "pageviews.html",
-  ) || confess $tt->error();
-
-  `cp -r static/ $output_path`;
-
-};
-
-1;
diff --git a/pageviews_reports/overview.pod b/pageviews_reports/overview.pod
index 9422c63..39c7914 100644
--- a/pageviews_reports/overview.pod
+++ b/pageviews_reports/overview.pod
@@ -82,6 +82,38 @@
 
 =end html
 
+
+
 =head1 Documentation
 
-This documentation was generated using pandoc.
+=begin html
+
+This documentation is generated (using pandoc and some custom markup 
conversion) by <b>generate-docs</b>.
+
+It is generated in 3 formats:
+
+<ul>
+<li> HTML
+<li> PDF
+<li> Mediawiki
+</ul>
+
+=end html
+
+=cut
+
+
+=head1 Cronjob
+
+=begin html
+
+This cron job is being installed by <b>cron-install.sh</b>. At this point the 
job is run at <b>1st of every month at 07:20</b> on stat1 time.
+
+The cron job runs bashrc and then cleans up an output directory in 
/tmp/pageviews-full-cron, and after that it runs the pageviews reports produces 
a csv and copies it
+
+to /a/wikistats_git/dumps/csv_csv/ where we expect this csv to be picked up by 
a different cronjob (which belongs to wikistats codebase).
+
+=end html
+
+=cut
+

-- 
To view, visit https://gerrit.wikimedia.org/r/67010
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I99f085722e85b576e931c23bd0df2c118db98005
Gerrit-PatchSet: 5
Gerrit-Project: analytics/wikistats
Gerrit-Branch: master
Gerrit-Owner: Stefan.petrea <[email protected]>
Gerrit-Reviewer: Stefan.petrea <[email protected]>
Gerrit-Reviewer: jenkins-bot

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to