Maybe it was there as nuance, however, I was trying to say that
***raw*** pageview numbers themself should not be the factor (whatever
% of the total that you apply), though some calculation based on
pageview with other factors, eg. an order of magnitude of the pageview
so all that range of pages has a smoothing factor.

If you are saying that the pageview is approximately a quarter, that
seems to be a very large number based on two letters typed  "po..."
has many combinations and that pornhub comes up early due to pageview
factor is ... ummm... thought provoking.  I would think that 1/4 of
searches for "po..." are not for pornhub, though I am not aware that
such data is available.

Regards, Billinghurst

On Tue, Jan 26, 2016 at 6:30 PM, Erik Bernhardson
<[email protected]> wrote:
> On Mon, Jan 25, 2016 at 11:16 PM, billinghurst <[email protected]>
> wrote:
>>
>> For the purpose of this exercise I think that it is completely
>> reasonable for staff/developers to play with the factors and make sure
>> that we are not having offence caused through this development. We
>> want the focus to be on the tool, and what it can do; not start a
>> bunfight and detract from the goal.
>>
>> For full production, I do NOT think that it is reasonable that either
>> staff or developers make the determination of what is or what is not
>> offensive, and whether a term should or should not be displayed. That
>> determination sits clearly with the community, and is part of a
>> discussion when the tool approaches full production and given to the
>> community. It is part of what the community can or will need to do.
>>
>> All that said, page views as a raw number should not be the
>> determinator of a suggestion. I will add fuller comment to the
>> phabricator ticket.
>>
>
> They arn't, and i hope noone was led to believe this was ever the intention.
> Page views is a factor. Currently the number of incoming wikilinks, outgoing
> wikilinks, external links, redirects, headings and the size of the article
> all have different weights. Page views is being added as another factor, the
> current WIP patch uses page views as ~23% of the final score (if my math is
> right).
>
>> Regards, Billinghurst
>>
>> On Tue, Jan 26, 2016 at 9:37 AM, Dan Garry <[email protected]> wrote:
>> > Hey David,
>> >
>> > Thanks for starting this discussion!
>> >
>> > On 22 January 2016 at 13:53, David Causse <[email protected]> wrote:
>> >>
>> >> http://en-suggesty.wmflabs.org/suggest.html is updated with a score
>> >> that
>> >> integrates pageviews.
>> >>
>> >> Pageviews solve most of the problems we encountered in the previous
>> >> formula unfortunately we now see some porn related suggestions.
>> >> - x will suggest xxx
>> >> - po will suggest pornhub just below poland in 2nd position. And is
>> >> ranked
>> >> #6 for the query 'p'
>> >
>> >
>> > As of right now, neither of these queries do this any more. "x" now
>> > suggests
>> > "Xinjiang" as the top result, and "po" now suggests "Pope Francis" after
>> > "Poland"... which may or may not be more palatable than Pornhub,
>> > depending
>> > on your viewpoints and ideals! Generally, Wikipedians like to point out
>> > that
>> > Wikipedia is not censored. That said, it's still worth considering
>> > whether
>> > this is appropriate or not. I personally don't have much of a problem
>> > with
>> > the fact that certain search results might be a little offensive... but
>> > I do
>> > think that they're probably also not really that useful.
>> >
>> > Given how volatile this has made our search results, my sense is that
>> > we're
>> > giving too much weight to how much we're letting page view data affect
>> > the
>> > ranking. Is it as simple as tweaking a coefficient so that page views
>> > are
>> > still taken into consideration but with lower weight, or do we need to
>> > do
>> > something more involved? I created T124722 to track this work, and added
>> > it
>> > our list of blockers for a wider rollout of the suggester.
>> >
>> > Thanks!
>> >
>> > Dan
>> >
>> > --
>> > Dan Garry
>> > Lead Product Manager, Discovery
>> > Wikimedia Foundation
>> >
>> > _______________________________________________
>> > discovery mailing list
>> > [email protected]
>> > https://lists.wikimedia.org/mailman/listinfo/discovery
>> >
>>
>> _______________________________________________
>> discovery mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/discovery
>
>
>
> _______________________________________________
> discovery mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/discovery
>

_______________________________________________
discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Reply via email to