Bearloga has uploaded a new change for review. (
https://gerrit.wikimedia.org/r/375091 )
Change subject: SRP visit times label fixes
......................................................................
SRP visit times label fixes
Also added data checks & fixed a bug introduced with a new version
of tidyr (at least I think that's how the issue started)
Change-Id: Ia3f4e6b030858b382c0a7c336d6759d022ebf14e
---
M modules/page_visit_times.R
M server.R
M tab_documentation/survival.md
M ui.R
M utils.R
5 files changed, 56 insertions(+), 38 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/wikimedia/discovery/rainbow
refs/changes/91/375091/1
diff --git a/modules/page_visit_times.R b/modules/page_visit_times.R
index 1321dd6..df1fbe9 100644
--- a/modules/page_visit_times.R
+++ b/modules/page_visit_times.R
@@ -22,7 +22,7 @@
tidyr::spread(label, time) %>%
polloi::reorder_columns() %>%
polloi::smoother(smooth_level =
polloi::smooth_switch(input$smoothing_global, input$smoothing_srp_ld_plot),
rename = FALSE) %>%
- polloi::make_dygraph(xlab = "", ylab = "Time (s)", title = "Time at N%
users leave the search results page") %>%
+ polloi::make_dygraph(xlab = "", ylab = "Time (s)", title = "Time at which
N% users leave the search results page") %>%
dyAxis("x", ticker = "Dygraph.dateTicker", axisLabelFormatter =
polloi::custom_axis_formatter,
axisLabelWidth = 100, pixelsPerLabel = 80) %>%
dyRoller(rollPeriod = input$rolling_srp_ld_plot) %>%
diff --git a/server.R b/server.R
index 752f5ba..b91bcf9 100644
--- a/server.R
+++ b/server.R
@@ -80,18 +80,28 @@
polloi::check_past_week(mobile_load_data, "Mobile Web load times"),
polloi::check_yesterday(android_dygraph_set, "Android events"),
polloi::check_past_week(android_load_data, "Android load times"),
+ polloi::check_yesterday(position_prop, "clicked result positions"),
+ polloi::check_past_week(position_prop, "clicked result positions"),
+ polloi::check_yesterday(source_prop, "source of search on Android"),
+ polloi::check_past_week(source_prop, "source of search on Android"),
polloi::check_yesterday(ios_dygraph_set, "iOS events"),
polloi::check_past_week(ios_load_data, "iOS load times"),
- polloi::check_yesterday(dplyr::bind_rows(split_dataset), "API usage
data"),
- polloi::check_past_week(dplyr::bind_rows(split_dataset), "API usage
data"),
+ polloi::check_yesterday(dplyr::bind_rows(split_dataset, .id = "api"),
"API usage data"),
+ polloi::check_past_week(dplyr::bind_rows(split_dataset, .id = "api"),
"API usage data"),
polloi::check_yesterday(failure_data_with_automata, "zero results data"),
polloi::check_past_week(failure_data_with_automata, "zero results data"),
polloi::check_yesterday(suggestion_with_automata, "suggestions data"),
polloi::check_past_week(suggestion_with_automata, "suggestions data"),
polloi::check_yesterday(augmented_clickthroughs, "engagement % data"),
polloi::check_past_week(augmented_clickthroughs, "engagement % data"),
- polloi::check_yesterday(user_page_visit_dataset, "survival times"),
- polloi::check_past_week(user_page_visit_dataset, "survival times"))
+ polloi::check_yesterday(paulscore_fulltext, "full-text PaulScores"),
+ polloi::check_past_week(paulscore_fulltext, "full-text PaulScores"),
+ polloi::check_yesterday(sister_search_traffic, "sister search traffic"),
+ polloi::check_past_week(sister_search_traffic, "sister search traffic"),
+ polloi::check_yesterday(user_page_visit_dataset, "page survival times"),
+ polloi::check_past_week(user_page_visit_dataset, "page survival times"),
+ polloi::check_yesterday(serp_page_visit_dataset, "serp survival times"),
+ polloi::check_past_week(serp_page_visit_dataset, "serp survival times"))
notifications <- notifications[!vapply(notifications, is.null, FALSE)]
return(dropdownMenu(type = "notifications", .list = notifications))
})
diff --git a/tab_documentation/survival.md b/tab_documentation/survival.md
index e066ad5..ae7ab59 100644
--- a/tab_documentation/survival.md
+++ b/tab_documentation/survival.md
@@ -1,15 +1,15 @@
-Automated survival analysis: page visit times
+How long searchers stay on the visited search results
=======
When someone is randomly selected for search satisfaction tracking (using our
[TSS2 schema](https://meta.wikimedia.org/wiki/Schema:TestSearchSatisfaction2)),
we use a check-in system and survival analysis to estimate how long users stay
on visited pages. To summarize the results on a daily basis, we record a set of
statistics based on a measure formally known as "[median lethal
dose](https://en.wikipedia.org/wiki/Median_lethal_dose)".
-This graph shows the length of time that must pass before N% of the users
leave the page they visited. When the number goes up, we can infer that users
are staying on the pages longer. In general, it appears it takes 15s to lose
10%, 25-35s to lose 25%, and 55-75s to lose 50%.
+This graph shows the length of time that must pass before N% of the users
leave the page (e.g. article) they visited. When the number goes up, we can
infer that users are staying on the pages longer. In general, it appears it
takes 15s to lose 10%, 25-35s to lose 25%, and 55-75s to lose 50%.
On most days, we retain at least 20% of the test population past the 7 minute
mark (the point at which the user's browser stops checking in), so on those
days we cannot calculate the time it takes to lose 90/95/99% of the users.
There are some days when we CAN calculate those times, and it can take
anywhere between 270s (4m30s) and 390s (6m30s) for 90% of the users to have
closed the page they clicked through from the search results page.
-Outages and inaccuracies
+Annotations
------
* '__R__': on 2017-01-01 we started calculating all of Discovery's metrics
using a new version of [our data retrieval and processing
codebase](https://phabricator.wikimedia.org/diffusion/WDGO/) that we migrated
to [Wikimedia Analytics](https://www.mediawiki.org/wiki/Analytics)'
[Reportupdater
infrastructure](https://wikitech.wikimedia.org/wiki/Analytics/Reportupdater).
See [T150915](https://phabricator.wikimedia.org/T150915) for more details.
* '__S__': on 2017-04-25 we changed the rates at which users are put into
event logging (see [T163273](https://phabricator.wikimedia.org/T163273)).
Specifically, we decreased the rate on English Wikipedia ("EnWiki") and
increased it everywhere else.
diff --git a/ui.R b/ui.R
index bbed6f2..73c8980 100644
--- a/ui.R
+++ b/ui.R
@@ -69,8 +69,8 @@
menuItem(text = "Sister Search",
menuSubItem(text = "Traffic", tabName =
"sister_search_traffic")),
menuItem(text = "Page Visit Times",
- menuSubItem(text = "Visited search results",
tabName = "survival"),
- menuSubItem(text = "Search result pages", tabName =
"spr_surv")),
+ menuSubItem(text = "Search result pages", tabName =
"spr_surv"),
+ menuSubItem(text = "Visited search results",
tabName = "survival")),
menuItem(text = "Language/Project Breakdown", tabName =
"langproj_breakdown"),
menuItem(text = "Global Settings",
selectInput(inputId = "smoothing_global", label =
"Smoothing", selectize = TRUE, selected = "day",
@@ -319,8 +319,8 @@
width = 3
),
column(
- numericInput("rolling_lethal_dose_plot", "Roll Period",
14, min = 1, max = 30),
- helpText("Each point is an average of this many days."),
+ numericInput("rolling_lethal_dose_plot", "Rolling
Average*", 14, min = 1, max = 30),
+ helpText("* Each point will become an average of this many
days."),
width = 3
),
column(
@@ -348,10 +348,10 @@
fluidRow(
column(
fluidRow(
- column(polloi::smooth_select("smoothing_srp_ld_plot"),
width = 8),
- column(numericInput("rolling_srp_ld_plot", "Roll
Period", 1, min = 1, max = 30), width = 4)
+ column(polloi::smooth_select("smoothing_srp_ld_plot"),
width = 7),
+ column(numericInput("rolling_srp_ld_plot", "Rolling
Average*", 1, min = 1, max = 30), width = 5)
),
- helpText("Each point is an average of this many days."),
+ helpText("* Each point will become an average of this many
days."),
width = 3
),
column(
diff --git a/utils.R b/utils.R
index 27c4b0e..91f2376 100644
--- a/utils.R
+++ b/utils.R
@@ -346,28 +346,36 @@
projects <- sub(" \\([0-9]{1,2}\\.[0-9]{1,3}%\\)", "", projects)
if (length(languages) == 1 && languages[1] == "(None)") {
temp <- data %>%
- dplyr::filter_(~project %in% projects) %>%
+ dplyr::filter_(~ project %in% projects) %>%
dplyr::rename(wiki = project) %>%
dplyr::group_by(date, wiki)
- if (input_metric %in% c("User engagement", "Threshold-passing %",
"Clickthrough rate")){
- temp %<>% dplyr::summarize(
- `Threshold-passing %` = round(sum(`Threshold-passing %` *
search_sessions_threshold, na.rm = TRUE) / sum(search_sessions_threshold, na.rm
= TRUE), 2),
- `Clickthrough rate` = round(sum(`Clickthrough rate` * `Result pages
opened`, na.rm = TRUE) / sum(`Result pages opened`, na.rm = TRUE), 2),
- `User engagement` = round((`Threshold-passing %` + `Clickthrough
rate`) / 2, 2))
- } else if (input_metric %in% c("clickthroughs", "Result pages opened",
"search sessions")){
- temp %<>% dplyr::summarize(
- clickthroughs = round(sum(as.numeric(clickthroughs), na.rm = TRUE), 2),
- `Result pages opened` = round(sum(as.numeric(`Result pages opened`),
na.rm = TRUE), 2),
- `search sessions` = round(sum(as.numeric(`search sessions`), na.rm =
TRUE), 2))
- } else if (input_metric %in% c("F = 0.1", "F = 0.5", "F = 0.9")){
- temp %<>% dplyr::summarize(
- `F = 0.1` = round(sum(`F = 0.1` * `search sessions`, na.rm = TRUE) /
sum(`search sessions`, na.rm = TRUE), 2),
- `F = 0.5` = round(sum(`F = 0.5` * `search sessions`, na.rm = TRUE) /
sum(`search sessions`, na.rm = TRUE), 2),
- `F = 0.9` = round(sum(`F = 0.9` * `search sessions`, na.rm = TRUE) /
sum(`search sessions`, na.rm = TRUE), 2))
- } else{
- temp %<>% dplyr::summarize(
- zero_results = sum(as.numeric(zero_results), na.rm = TRUE),
- total = sum(as.numeric(total), na.rm = TRUE))
+ if (input_metric %in% c("User engagement", "Threshold-passing %",
"Clickthrough rate")) {
+ temp %<>%
+ dplyr::summarize(
+ `Threshold-passing %` = round(sum(`Threshold-passing %` *
search_sessions_threshold, na.rm = TRUE) / sum(search_sessions_threshold, na.rm
= TRUE), 2),
+ `Clickthrough rate` = round(sum(`Clickthrough rate` * `Result pages
opened`, na.rm = TRUE) / sum(`Result pages opened`, na.rm = TRUE), 2),
+ `User engagement` = round((`Threshold-passing %` + `Clickthrough
rate`) / 2, 2)
+ )
+ } else if (input_metric %in% c("clickthroughs", "Result pages opened",
"search sessions")) {
+ temp %<>%
+ dplyr::summarize(
+ clickthroughs = round(sum(as.numeric(clickthroughs), na.rm = TRUE),
2),
+ `Result pages opened` = round(sum(as.numeric(`Result pages opened`),
na.rm = TRUE), 2),
+ `search sessions` = round(sum(as.numeric(`search sessions`), na.rm =
TRUE), 2)
+ )
+ } else if (input_metric %in% c("F = 0.1", "F = 0.5", "F = 0.9")) {
+ temp %<>%
+ dplyr::summarize(
+ `F = 0.1` = round(sum(`F = 0.1` * `search sessions`, na.rm = TRUE) /
sum(`search sessions`, na.rm = TRUE), 2),
+ `F = 0.5` = round(sum(`F = 0.5` * `search sessions`, na.rm = TRUE) /
sum(`search sessions`, na.rm = TRUE), 2),
+ `F = 0.9` = round(sum(`F = 0.9` * `search sessions`, na.rm = TRUE) /
sum(`search sessions`, na.rm = TRUE), 2)
+ )
+ } else {
+ temp %<>%
+ dplyr::summarize(
+ zero_results = sum(as.numeric(zero_results), na.rm = TRUE),
+ total = sum(as.numeric(total), na.rm = TRUE)
+ )
}
temp %<>% dplyr::ungroup()
} else {
@@ -376,15 +384,15 @@
tidyr::unite(wiki, language, project, sep = " ") %>%
dplyr::mutate(wiki = sub("(None) ", "", wiki, fixed = TRUE))
}
- if (input_metric == "Zero result rate"){
+ if (input_metric == "Zero result rate") {
temp %<>%
dplyr::mutate(zrr = round(100 * as.numeric(zero_results) /
as.numeric(total), 2)) %>%
dplyr::select(-c(total, zero_results)) %>%
tidyr::spread(wiki, zrr)
} else {
temp %<>%
- dplyr::select_(.dots = c("date", "wiki", paste0("`", input_metric,
"`"))) %>%
- tidyr::spread_(., key_col = "wiki", value_col = input_metric, fill = 0)
+ dplyr::select_(.dots = c("date", "wiki", "val" = paste0("`",
input_metric, "`"))) %>%
+ tidyr::spread(., wiki, val, fill = 0)
}
return(temp)
}
--
To view, visit https://gerrit.wikimedia.org/r/375091
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia3f4e6b030858b382c0a7c336d6759d022ebf14e
Gerrit-PatchSet: 1
Gerrit-Project: wikimedia/discovery/rainbow
Gerrit-Branch: develop
Gerrit-Owner: Bearloga <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits