[MediaWiki-commits] [Gerrit] Update Rel Lab: documentation, better error handling, more c... - change (wikimedia...relevancylab)

jenkins-bot (Code Review) Thu, 18 Feb 2016 08:52:37 -0800

jenkins-bot has submitted this change and it was merged.

Change subject: Update Rel Lab: documentation, better error handling, more 
configurability
......................................................................



Update Rel Lab: documentation, better error handling, more configurability

README.md
- first pass at basic documentation

relcomp.py
- catch and report on search errors
- make report sections collapsible (and collapse them)
- make number of examples printed configurable from the command line
- refactor query string formatting
- refactor ascii-ification of metric results

relevancyRunner.py
- make queries, labHost, config, and searchCommand global [settings]
  that can be overridden by config under each [test#]

cqd.py
- remove repeated limit init

relevance.ini
- add config and docs for setting examples printed per metric
- added docs for global vs local settings
- moved queries and config under global [settings]

Bug: T126646
Change-Id: Ib5ef1717883ddfce1ec8b3cfd6fd2fdf19a86a7f
---
A README.md
M cqd.py
M relcomp.py
M relevance.ini
M relevancyRunner.py
5 files changed, 270 insertions(+), 42 deletions(-)

Approvals:
  DCausse: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/README.md b/README.md
new file mode 100644
index 0000000..ad36500
--- /dev/null
+++ b/README.md
@@ -0,0 +1,148 @@
+# Relevanc(e|y) Lab<sup>*</sup>
+
+The primary purpose of the Relevance Lab is to allow us<sup>†</sup> to 
experiment with proposed modifications to our search process and gauge their 
effectiveness<sup>‡</sup> and impact<sup>§</sup> before releasing them into 
production, and even before doing any kind of user acceptance or A/B testing. 
Also, testing in the relevance lab gives an additional benefit over A/B tests 
(esp. in the case of very targeted changes): with A/B tests we aren't 
necessarily able to test the behavior of the *same query* with two different 
configurations.
+
+<small>
+\* Both *relevance* and *relevancy* are attested. They mean [the same 
thing](https://en.wiktionary.org/wiki/relevance#Alternative_forms "See 
Wiktionary"). We want to be inclusive, so either is allowed. Note that *Rel 
Lab* saves several keystrokes and avoids having to choose.
+
+† Appropriate values of "us" include the Discovery team, other WMF teams, and 
potentially the wider community of Wiki users and developers.
+
+‡ "Does it do anything good?"
+
+§ "How many searches does it affect?"
+</small>
+
+## Prerequisites
+
+* Python: There's nothing too fancy here, and it works with Python 2.7, though 
a few packages are required:
+ * The package `jsonpath-rw` is required by the main Rel Lab.
+ * The package `termcolor` is required by the Cirrus Query Debugger.
+ * If you don't have one of these packages, you can get it with `pip install 
<package-name>` (`sudo` may be required to install packages).
+* SSH access to the host you intend to connect to.
+
+## Invocation
+
+The main Rel Lab process is `relevancyRunner.py`, which takes a `.ini` config 
file (see below):
+
+        relevancyRunner.py -c relevance.ini
+
+### Processes
+
+`relevancyRunner.py` parses the `.ini` file (see below), manages 
configuration, runs the queries against the Elasticsearch cluster and outputs 
the results, and then delegates diffing the results to the `jsonDiffTool` 
specified in the `.ini` file, and delegated the final report to the 
`metricTool` specified in the `.ini` file. It also archives the original 
queries and configuration (`.ini` and JSON `config` files) with the Rel Lab run 
output.
+
+The `jsonDiffTool` is implemented as `jsondiff.py`, "an almost smart enough 
JSON diff tool". It's actually not that smart: it munges the search results 
JSON a bit, pretty-prints it, and then uses Python's HtmlDiff to make 
reasonably pretty output.
+
+The `metricTool` is implemented as `relcomp.py`, which generates an HTML 
report comparing two relevance lab query runs. A number of metrics are defined, 
including zero results rate and a generic top-N diffs (sorted or not). Adding 
and configuring these metrics can be done in `main`, in the array `myMetrics`. 
Examples of queries that change from one run to the next for each metric are 
provided, with links into the diffs created by `jsondiff.py`.
+
+Running the queries is typically the most time-consuming part of the process. 
If you ask for a very large number of results for each query (≫100), the diff 
step can be very slow. The report processing is generally very quick.
+
+### Configuration
+
+The Rel Lab is configured by way of an .ini file. A sample, `relevance.ini`, 
is provided. Global settings are provided in `[settings]`, and config for the 
two test runs are in `[test1]` and `[test2]`.
+
+Additional command line arguments can be added to `searchCommand` to affect 
the way the queries are run (such as what wiki to run against, changing the 
number of results returned, and including detailed scoring information.
+
+The number of examples provided by `jsondiff.py` is configurable in the 
`metricTool` command line.
+
+See `relevance.ini` for more details on the command line arguments.
+
+Each `[test#]` contains the `name` of the query set, and the file containing 
the `queries` (see Input below). Optionally, a JSON `config` file can be 
provided, which is passed to `runSearch.php` on the command line. These JSON 
configurations should be formatted as a single line.
+
+The settings `queries`, `labHost`, `config`, and `searchCommand` can be 
specified globally under `[settings]` or per-run under `[test#]`. If both 
exist, `[test#]` will override `[settings]`.
+
+#### Example JSON configs:
+
+* `{"wgCirrusSearchFunctionRescoreWindowSize": 1, 
"wgCirrusSearchPhraseRescoreWindowSize" : 1}`
+       * Set the Function Rescore Window Size to 1, and set the Phrase Rescore 
Window Size to 1.
+
+* `{"wgCirrusSearchAllFields": {"use": false}}`
+       * Set `$wgCirrusSearchAllFields['use']` to `false`.
+
+* `{"wgCirrusSearchClusters":{"default": [{"host":"nobelium.eqiad.wmnet", 
"port":"80"}]}}`
+       * Forward queries to the Nobelium cluster, which uses non-default port 
80.
+
+## Input
+
+Queries should be formatted as Unicode text, with one query per line in the 
file specified under `queries`. Typically, the same queries file would be used 
by both runs, and the JSON `config` would be the only difference between the 
runs.
+
+However, you could have different queries in two different files (e.g., one 
with quotes and one with the quotes removed). Queries are compared 
sequentially. That is, the first one in one file is compared to the first one 
in the other file, etc.
+
+Query input should not contain tabs.
+
+
+## Output
+
+By default, Rel Lab run results are written out to the `relevance/` directory. 
This can be configured under `workDir` under `[settings]` in the `.ini` file.
+
+A directory for each query set is created in the `relevance/queries/` 
directory. The directory is a "safe" version of the `name` given under 
`[test#]`. This directory contains the queries, the results, and a copy of the 
JSON config file used, if any, under the name `config.json`.
+
+A directory for each comparison between `[test1]` and `[test2]` is created un 
the `relevance/comparisons/` directory. The name is a concatenation of the 
"safe" versions of the `name`s given to the query sets. The original `.ini` 
file is copied to `config.ini`, the final report is in `report.html`, and the 
diffs are stored in the `diffs/` directory, and are named in order as 
`diff#.html`.
+
+
+## Other Tools
+
+There are a few other bits and bobs included with the Rel Lab.
+
+### Cirrus Query Debugger
+
+The Cirrus Query Debugger (`cqd.py`) is a command line tool to display various 
debugging information for individual queries.
+
+Run `cqd.py --help` for more details.
+
+Note that `cqd.py` requires the `termcolor` package.
+
+Helpful hint: If you want to pipe the output of `cqd.py` through `less`, you 
will want to use `less`'s `-R` option, which makes it understand and preserve 
the color output from `cqd.py`, and you might want to use `less`'s `-S` option, 
which doesn't wrap lines (arrow left and right to see long lines), depending on 
which part of the output you are using most.
+
+### Import Indices
+
+Import Indices (`importindices.py`) downloads elasticsearch indices from 
wikimedia dumps and imports them to an elasticsearch cluster. It lives with the 
Rel Lab but is used on the Elasticsearch server you connect to, not your local 
machine.
+
+### Miscellaneous
+
+The `misc/` directory contains additional useful stuff:
+
+* `fulltextQueriesSample.hql` contains a well-commented example HQL query to 
run against HIVE to extract a sample query set of fulltext queries.
+
+### Gerrit Config
+
+These files help Gerrit process patches correctly and are not directly part of 
the Rel Lab:
+
+* `setup.cfg`
+* `tox.ini`
+
+## Options!
+
+There are lots of options which can be passed as JSON in `config` files, or as 
options to the Cirrus Query Debugger (specifically, or generally using the 
custom `-c` option).
+
+For more details on what the options do, see `CirrusSearch.php` in the 
[CirrusSearch extension](https://www.mediawiki.org/wiki/Extension:CirrusSearch).
+
+For reference, here are some options and their names in JSON, Cirrus Query 
Debugger (CDQ), or the web API (API names are available using `-c` with CDQ).
+
+* *Phrase Window*—Default: 512; JSON: `wgCirrusSearchPhraseRescoreWindowSize`; 
CDQ: `-pw`; API: `cirrusPhraseWindow`.
+
+* *Function Window*—Default: 8196; JSON: 
`wgCirrusSearchFunctionRescoreWindowSize`; CDQ: `-fw`; API: 
`cirrusFunctionWindow`.
+
+* *Rescore Profile*—Default: default; CDQ: `-rp`;
+ * default: boostlinks and templates by default + optional criteria activated 
by special syntax (namespaces, prefer-recent, language, ...)
+ * default_noboostlinks : default minus boostlinks
+ * empty (will be deployed soon)
+
+* *All Fields*—Default: true/yes; JSON: `wgCirrusSearchAllFields`; CDQ: 
`--allField`; API: `cirrusUseAllFields`.
+ * JSON default: {"use": true}
+
+* *Phrase Boost*—Default: 10; JSON: `wgCirrusSearchPhraseRescoreBoost`; API: 
`cirrusPhraseBoost`.
+
+* *Phrase Slop*—Default: 1; JSON: `wgCirrusSearchPhraseSlop`; API: 
`cirrusPhraseSlop`.
+ * API sets `boost` sub-value
+ * JSON default: {"boost": 1, "precise": 0, "default": 0}
+
+* *Boost Links*—Default: true/yes; JSON: `wgCirrusSearchBoostLinks`; API: 
`cirrusBoostLinks`.
+
+* *Common Terms Query*—Default: false/no; JSON: 
`wgCirrusSearchUseCommonTermsQuery`; API: `cirrusUseCommonTermsQuery`.
+
+* *Common Terms Query Profile*—Default: default; API: 
`cirrusCommonTermsQueryProfile`.
+ * default: requires 4 terms in the query to be activated
+ * strict: requires 6 terms in the query to be activated
+ * aggressive_recall: requires 3 terms in the query to be activated
+
+See also the "[more 
like](https://www.mediawiki.org/wiki/Help:CirrusSearch#morelike:)" options.
diff --git a/cqd.py b/cqd.py
index 2c742f2..523d6b0 100755
--- a/cqd.py
+++ b/cqd.py
@@ -61,7 +61,6 @@
     def __init__(self, args):
         self.limit = args.limit
         self.offset = args.offset
-        self.limit = args.limit
         self.functionWindow = args.functionWindow
         self.phraseWindow = args.phraseWindow
         self.rescoreProfile = args.rescoreProfile
diff --git a/relcomp.py b/relcomp.py
index 6c902c2..cee3148 100755
--- a/relcomp.py
+++ b/relcomp.py
@@ -104,20 +104,7 @@
         """Add example diff to b2d_diff (delta=False) or d2b_diff (delta=True)
         """
 
-        query_string = b_query = d_query = ""
-
-        if "query" in b:
-            b_query = b["query"]
-        if "query" in d:
-            d_query = d["query"]
-
-        if b_query == d_query:
-            query_string = b_query
-        else:
-            query_string = u"{} / {}".format(b_query, d_query)
-
-        if query_string == "":
-            query_string = "[no-query-string]"
+        query_string = make_query_string(b, d)
 
         if delta:
             self.d2b_diff.append([index, query_string])
@@ -148,13 +135,16 @@
             if self.raw_count:
                 ret_string += "<b>{}:</b> {}{}".format(self.name, count, 
diffstr)
             else:
+                q_pct = 100*count/float(self.total_queries) if 
self.total_queries else 0
                 ret_string += "<b>{}:</b> {:.1f}%{}".format(
-                    self.name, 100*count/float(self.total_queries), diffstr
+                    self.name, q_pct, diffstr
                     )
-            return ret_string + "<br>\n"
+            ret_string += "<br>\n"
+            return ret_string.encode('ascii', 'xmlcharrefreplace')
 
         elif self.printnum > 0:  # diff
-            ret_string = "<b>{}:</b><br>\n".format(self.name)
+            ret_string = "<b>{}:</b>\n".format(self.name)
+            ret_string += toggle_string()
             printed = 0
             if self.printset == "random":
                 # shuffle, unless all will be printed, then don't bother
@@ -181,7 +171,8 @@
                     printed += 1
                     if printed >= self.printnum:
                         break
-            return ret_string + "<br>\n"
+            ret_string += "</span>\n<br>\n"
+            return ret_string.encode('ascii', 'xmlcharrefreplace')
 
         return ""
 
@@ -196,9 +187,10 @@
 
     __metaclass__ = ABCMeta
 
-    def __init__(self):
+    def __init__(self, printnum=20):
         super(ZeroResultsRate, self).__init__("Zero Results",
-                                              symbols=["&darr;", "&uarr;"])
+                                              symbols=["&darr;", "&uarr;"],
+                                              printnum=printnum)
 
     def has_condition(self, x, y):
         """Simple check: is totalHits == 0?
@@ -215,12 +207,12 @@
 
     __metaclass__ = ABCMeta
 
-    def __init__(self, topN=5, sorted=False):
+    def __init__(self, topN=5, sorted=False, printnum=20):
         sortstr = "Sorted" if sorted else "Unsorted"
         self.sorted = sorted
         self.topN = topN
         super(TopNDiff, self).__init__("Top {} {} Results Differ".format(topN, 
sortstr),
-                                       symmetric=True)
+                                       symmetric=True, printnum=printnum)
 
     def has_condition(self, x, y):
         if "totalHits" in x:
@@ -263,21 +255,78 @@
         return not len(x) == 0
 
 
-def print_report(target_dir, diff_count, file1, file2, myMetrics):
+def make_query_string(x, y):
+        query_string = x_query = y_query = ""
+
+        if "query" in x:
+            x_query = x["query"]
+        if "query" in y:
+            y_query = y["query"]
+
+        if x_query == y_query:
+            query_string = x_query
+        else:
+            query_string = u"{} / {}".format(x_query, y_query)
+
+        if query_string == "":
+            query_string = "[no-query-string]"
+
+        return query_string
+
+
+def print_report(target_dir, diff_count, file1, file2, myMetrics, errors):
     report_file = open(target_dir + "report.html", "w")
     report_file.write(textwrap.dedent("""\
+        <script>
+        function toggle (button, span) {{
+            sp = document.getElementById(span);
+            if (sp.style.display == 'none' || sp.style.display == '') {{
+                button.innerHTML = '[ &ndash; ]';
+                sp.style.display = 'inline';
+                }}
+            else {{
+                button.innerHTML = '[ + ]';
+                sp.style.display = 'none';
+                }}
+            }}
+        </script>
+
+        <style>
+        .button {{cursor:pointer}}
+        .toggle {{display:none}}
+        </style>
+
         <h2>Comparison run summary: {}</h2>
         <blockquote>
         <b>Stats:</b> {} query pairs compared<br>
+        """).format(target_dir, diff_count))
+
+    if len(errors):
+        report_file.write("<br>\n<font color=red><b>QUERY PAIRS WITH ERRORS: " 
+
+                          "{}</b></font>\n".format(len(errors)))
+        report_file.write(toggle_string())
+        printed = 0
+        keylist = errors.keys()
+        shuffle(keylist)
+        for e in keylist:
+            report_file.write("&nbsp;&nbsp; <font color=red>ERROR</font> " +
+                              "<a href='diffs/diff{}.html'>{}</a><br>\n".
+                              format(e, errors[e].encode('ascii', 
'xmlcharrefreplace')))
+            printed += 1
+            if printed >= 50:
+                break
+        report_file.write("</span>\n")
+
+    report_file.write(textwrap.dedent("""\
         </blockquote>
 
         <h3>Baseline: {}</h3>
         <blockquote>
         <b>Metrics:</b><br>
-        """).format(target_dir, diff_count, file1))
+        """).format(file1))
 
     for m in myMetrics:
-        report_file.write(m.results("baseline").encode('ascii', 
'xmlcharrefreplace'))
+        report_file.write(m.results("baseline"))
 
     report_file.write(textwrap.dedent("""\
         </blockquote>
@@ -288,7 +337,7 @@
         """).format(file2))
 
     for m in myMetrics:
-        report_file.write(m.results("delta").encode('ascii', 
'xmlcharrefreplace'))
+        report_file.write(m.results("delta"))
 
     report_file.write(textwrap.dedent("""\
         </blockquote>
@@ -298,9 +347,16 @@
         """))
 
     for m in myMetrics:
-        report_file.write(m.results().encode('ascii', 'xmlcharrefreplace'))
+        report_file.write(m.results())
 
     report_file.write("</blockquote>")
+
+
+def toggle_string():
+    toggle_string.num += 1
+    return("<span onclick='toggle(this,\"toggle{}\")' 
class=button>".format(toggle_string.num) +
+           "[ + ]</span><br>\n<span id=toggle{} 
class=toggle>\n".format(toggle_string.num))
+toggle_string.num = 0
 
 
 def main():
@@ -311,22 +367,29 @@
     parser.add_argument("file", nargs=2, help="files to diff")
     parser.add_argument("-d", "--dir", dest="dir", default="./comp/",
                         help="output directory, default is ./comp/")
+    parser.add_argument("-p", "--printnum", dest="printnum", default=20,
+                        help="number of samples per metric, default is 20")
     args = parser.parse_args()
 
     (file1, file2) = args.file
     target_dir = args.dir + "/"
+    printnum = int(args.printnum)
 
     if not os.path.exists(target_dir):
         os.makedirs(os.path.dirname(target_dir))
 
     diff_count = 0
+    errors = {}
 
     # set up metrics
+    # TODO: make this configurable from the .ini file
     myMetrics = [
         QueryCount(),
-        ZeroResultsRate(),
-        TopNDiff(5, sorted=False),
-        TopNDiff(5, sorted=True)
+        ZeroResultsRate(printnum=printnum),
+        TopNDiff(3, sorted=False, printnum=printnum),
+        TopNDiff(3, sorted=True, printnum=printnum),
+        TopNDiff(5, sorted=False, printnum=printnum),
+        TopNDiff(5, sorted=True, printnum=printnum)
         ]
 
     with open(file1) as a, open(file2) as b:
@@ -342,10 +405,15 @@
             bjson = json.loads(bline)
 
             diff_count += 1
+
+            if 'error' in ajson or 'error' in bjson:
+                errors[diff_count] = make_query_string(ajson, bjson)
+                continue
+
             for m in myMetrics:
                 m.measure(ajson, bjson, diff_count)
 
-    print_report(target_dir, diff_count, file1, file2, myMetrics)
+    print_report(target_dir, diff_count, file1, file2, myMetrics, errors)
 
 
 if __name__ == "__main__":
diff --git a/relevance.ini b/relevance.ini
index d15d57f..3a6b4da 100644
--- a/relevance.ini
+++ b/relevance.ini
@@ -11,15 +11,19 @@
 ; JSON Diff tool
 jsonDiffTool = python jsondiff.py -d
 ; Comparison/metric reporting tool
-metricTool = python relcomp.py -d
+;   additional params should go before -d
+;   -p 100 to set the number of examples printed per metric to 100 (defaults 
to 20)
+metricTool = python relcomp.py -p 20 -d
+; queries to be run
+queries = test.q
 
 [test1]
 name = Test 1
-queries = test1.q
-;config = test1.json
+config = test1.json
 
 [test2]
 name = Test 2
-queries = test2.q
 ;config = test2.json
 
+; labHost, searchCommand, queries, and config can be specified globally under 
[settings] or locally under [test#]. Local settings override global settings.
+; config is optional
\ No newline at end of file
diff --git a/relevancyRunner.py b/relevancyRunner.py
index f416279..9a96433 100755
--- a/relevancyRunner.py
+++ b/relevancyRunner.py
@@ -39,16 +39,24 @@
     qname = getSafeName(config.get(section, 'name'))
     qdir = config.get('settings', 'workDir') + "/queries/" + qname
     refreshDir(qdir)
-    cmdline = config.get('settings', 'searchCommand')
+    cmdline = config.get(section, 'searchCommand')
     if config.has_option(section, 'config'):
         cmdline += " --options " + pipes.quote(open(config.get(section, 
'config')).read())
         shutil.copyfile(config.get(section, 'config'),
                         qdir + '/config.json')  # archive search config
     runCommand("cat %s | ssh %s %s > %s" % (config.get(section, 'queries'),
-                                            config.get('settings', 'labHost'),
+                                            config.get(section, 'labHost'),
                                             pipes.quote(cmdline), qdir + 
"/results"))
     shutil.copyfile(config.get(section, 'queries'), qdir + '/queries')  # 
archive queries
     return qdir + "/results"
+
+
+def distributeGlobalSettings(config, globals, sections, settings):
+    # if settings are missing from sections, copy from globals
+    for sec in sections:
+        for set in settings:
+            if not config.has_option(sec, set) and config.has_option(globals, 
set):
+                config.set(sec, set, config.get(globals, set))
 
 
 def checkSettings(config, section, settings):
@@ -69,10 +77,11 @@
 
 config = ConfigParser.ConfigParser()
 config.readfp(open(args.config))
-checkSettings(config, 'settings', ['labHost', 'workDir', 'jsonDiffTool',
-                                   'metricTool', 'searchCommand'])
-checkSettings(config, 'test1', ['name', 'queries'])
-checkSettings(config, 'test2', ['name', 'queries'])
+distributeGlobalSettings(config, 'settings', ['test1', 'test2'],
+                         ['queries', 'labHost', 'searchCommand', 'config'])
+checkSettings(config, 'settings', ['workDir', 'jsonDiffTool', 'metricTool'])
+checkSettings(config, 'test1', ['name', 'queries', 'labHost', 'searchCommand'])
+checkSettings(config, 'test2', ['name', 'queries', 'labHost', 'searchCommand'])
 
 res1 = runSearch(config, 'test1')
 res2 = runSearch(config, 'test2')

-- 
To view, visit https://gerrit.wikimedia.org/r/271356
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ib5ef1717883ddfce1ec8b3cfd6fd2fdf19a86a7f
Gerrit-PatchSet: 5
Gerrit-Project: wikimedia/discovery/relevancylab
Gerrit-Branch: master
Gerrit-Owner: Tjones <[email protected]>
Gerrit-Reviewer: DCausse <[email protected]>
Gerrit-Reviewer: EBernhardson <[email protected]>
Gerrit-Reviewer: Smalyshev <[email protected]>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

[MediaWiki-commits] [Gerrit] Update Rel Lab: documentation, better error handling, more c... - change (wikimedia...relevancylab)

Reply via email to