Forwarding to dev list, which is where it belongs :)
-------- Forwarded Message -------- Subject: Re: incubator-ponymail git commit: 'hot topics' feature should use terms, not significant_terms Date: Sun, 8 Jan 2017 01:01:07 +0100 From: Daniel Gruno <[email protected]> Reply-To: [email protected] To: [email protected] On 01/08/2017 12:55 AM, [email protected] wrote: > Repository: incubator-ponymail > Updated Branches: > refs/heads/master e153c4abc -> 2ebf5e7a7 > > > 'hot topics' feature should use terms, not significant_terms > > This fixes #329 > > Project: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/repo > Commit: > http://git-wip-us.apache.org/repos/asf/incubator-ponymail/commit/2ebf5e7a > Tree: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/tree/2ebf5e7a > Diff: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/diff/2ebf5e7a > > Branch: refs/heads/master > Commit: 2ebf5e7a735f54042c6c59d80a932bb4bc6a96cd > Parents: e153c4a > Author: Sebb <[email protected]> > Authored: Sat Jan 7 23:55:21 2017 +0000 > Committer: Sebb <[email protected]> > Committed: Sat Jan 7 23:55:21 2017 +0000 > > ---------------------------------------------------------------------- > CHANGELOG.md | 1 + > site/api/stats.lua | 6 ++++-- > tools/setup.py | 1 + > 3 files changed, 6 insertions(+), 2 deletions(-) > ---------------------------------------------------------------------- > > > http://git-wip-us.apache.org/repos/asf/incubator-ponymail/blob/2ebf5e7a/CHANGELOG.md > ---------------------------------------------------------------------- > diff --git a/CHANGELOG.md b/CHANGELOG.md > index b440207..14e9c30 100644 > --- a/CHANGELOG.md > +++ b/CHANGELOG.md > @@ -109,6 +109,7 @@ > - absolute URLs must be prefixed with URLBase in JS files (#327) > - cannot use absolute URLs in HTML pages (#328) > - setup.py now prompts for shard and replica counts when creating the index > (#313) > +- 'hot topics' feature should use terms, not significant_terms (#329) > > ## CHANGES in 0.9b: > > > http://git-wip-us.apache.org/repos/asf/incubator-ponymail/blob/2ebf5e7a/site/api/stats.lua > ---------------------------------------------------------------------- > diff --git a/site/api/stats.lua b/site/api/stats.lua > index a8f11ec..62da9f1 100644 > --- a/site/api/stats.lua > +++ b/site/api/stats.lua > @@ -30,6 +30,8 @@ local days = { > } > > local BODY_MAXLEN = config.stats_maxBody or 200 > +-- words to exclude from word cloud: > +local EXCLUDE = config.stats_wordExclude or ".|..|..." > > local function sortEmail(thread) > if thread.children and type(thread.children) == "table" then > @@ -411,10 +413,10 @@ function handle(r) > terminate_after = 100, > aggs = { > cloud = { > - significant_terms = { > + terms = { > field = "subject", > size = 10, > - chi_square = {} > + exclude = EXCLUDE > } > } > }, Exqueeze me? significant_terms is specifically used, so, for instance, apache lists, don't get "apache" as a hot topic. It has to be a topic that is "trending in this query, but not in general", not "what do we have most of around here". If we just use terms, the result becomes utterly useless, as it does not take into account how common those terms are in, let's say, the ASF in general. Consider this a -1 to that commit unless you can convince me otherwise. With regards, Daniel. > > http://git-wip-us.apache.org/repos/asf/incubator-ponymail/blob/2ebf5e7a/tools/setup.py > ---------------------------------------------------------------------- > diff --git a/tools/setup.py b/tools/setup.py > index c02116a..19d4bd5 100755 > --- a/tools/setup.py > +++ b/tools/setup.py > @@ -501,6 +501,7 @@ local config = { > full_headers = false, > maxResults = 5000, -- max emails to return in one go. Might need to be > bumped for large lists > -- stats_maxBody = 200, -- max size of body snippet returned by stats.lua > +-- stats_wordExclude = ".|..|...", -- patterns to exclude from word cloud > generated by stats.lua > admin_oauth = {}, -- list of domains that may do administrative oauth > (private list access) > -- add 'www.googleapis.com' to the list for google > oauth to decide, for instance. > oauth_fields = { -- used for specifying individual oauth handling > parameters. >
