Maybe it was there as nuance, however, I was trying to say that ***raw*** pageview numbers themself should not be the factor (whatever % of the total that you apply), though some calculation based on pageview with other factors, eg. an order of magnitude of the pageview so all that range of pages has a smoothing factor.
If you are saying that the pageview is approximately a quarter, that seems to be a very large number based on two letters typed "po..." has many combinations and that pornhub comes up early due to pageview factor is ... ummm... thought provoking. I would think that 1/4 of searches for "po..." are not for pornhub, though I am not aware that such data is available. Regards, Billinghurst On Tue, Jan 26, 2016 at 6:30 PM, Erik Bernhardson <[email protected]> wrote: > On Mon, Jan 25, 2016 at 11:16 PM, billinghurst <[email protected]> > wrote: >> >> For the purpose of this exercise I think that it is completely >> reasonable for staff/developers to play with the factors and make sure >> that we are not having offence caused through this development. We >> want the focus to be on the tool, and what it can do; not start a >> bunfight and detract from the goal. >> >> For full production, I do NOT think that it is reasonable that either >> staff or developers make the determination of what is or what is not >> offensive, and whether a term should or should not be displayed. That >> determination sits clearly with the community, and is part of a >> discussion when the tool approaches full production and given to the >> community. It is part of what the community can or will need to do. >> >> All that said, page views as a raw number should not be the >> determinator of a suggestion. I will add fuller comment to the >> phabricator ticket. >> > > They arn't, and i hope noone was led to believe this was ever the intention. > Page views is a factor. Currently the number of incoming wikilinks, outgoing > wikilinks, external links, redirects, headings and the size of the article > all have different weights. Page views is being added as another factor, the > current WIP patch uses page views as ~23% of the final score (if my math is > right). > >> Regards, Billinghurst >> >> On Tue, Jan 26, 2016 at 9:37 AM, Dan Garry <[email protected]> wrote: >> > Hey David, >> > >> > Thanks for starting this discussion! >> > >> > On 22 January 2016 at 13:53, David Causse <[email protected]> wrote: >> >> >> >> http://en-suggesty.wmflabs.org/suggest.html is updated with a score >> >> that >> >> integrates pageviews. >> >> >> >> Pageviews solve most of the problems we encountered in the previous >> >> formula unfortunately we now see some porn related suggestions. >> >> - x will suggest xxx >> >> - po will suggest pornhub just below poland in 2nd position. And is >> >> ranked >> >> #6 for the query 'p' >> > >> > >> > As of right now, neither of these queries do this any more. "x" now >> > suggests >> > "Xinjiang" as the top result, and "po" now suggests "Pope Francis" after >> > "Poland"... which may or may not be more palatable than Pornhub, >> > depending >> > on your viewpoints and ideals! Generally, Wikipedians like to point out >> > that >> > Wikipedia is not censored. That said, it's still worth considering >> > whether >> > this is appropriate or not. I personally don't have much of a problem >> > with >> > the fact that certain search results might be a little offensive... but >> > I do >> > think that they're probably also not really that useful. >> > >> > Given how volatile this has made our search results, my sense is that >> > we're >> > giving too much weight to how much we're letting page view data affect >> > the >> > ranking. Is it as simple as tweaking a coefficient so that page views >> > are >> > still taken into consideration but with lower weight, or do we need to >> > do >> > something more involved? I created T124722 to track this work, and added >> > it >> > our list of blockers for a wider rollout of the suggester. >> > >> > Thanks! >> > >> > Dan >> > >> > -- >> > Dan Garry >> > Lead Product Manager, Discovery >> > Wikimedia Foundation >> > >> > _______________________________________________ >> > discovery mailing list >> > [email protected] >> > https://lists.wikimedia.org/mailman/listinfo/discovery >> > >> >> _______________________________________________ >> discovery mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/discovery > > > > _______________________________________________ > discovery mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/discovery > _______________________________________________ discovery mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/discovery
