Hello,
happy to join the discussion.

I also think that a search by phonetic is a really good improvement,
currently many times you search on google and then copy paste.

I am also experimenting with elastic search, and thanks to this thread I
discovered wikipedia is also using it with CirrusSearch; could search by
applied only to *links names* (no text) of currently not phonetically
supported languages, and then map results on ES?

e.g. for chinese
https://pypi.python.org/pypi/dragonmapper

Maybe also ES has their own support ?




On Tue, Jan 26, 2016 at 8:30 AM, Erik Bernhardson <
[email protected]> wrote:

> On Mon, Jan 25, 2016 at 11:16 PM, billinghurst <[email protected]
> > wrote:
>
>> For the purpose of this exercise I think that it is completely
>> reasonable for staff/developers to play with the factors and make sure
>> that we are not having offence caused through this development. We
>> want the focus to be on the tool, and what it can do; not start a
>> bunfight and detract from the goal.
>>
>> For full production, I do NOT think that it is reasonable that either
>> staff or developers make the determination of what is or what is not
>> offensive, and whether a term should or should not be displayed. That
>> determination sits clearly with the community, and is part of a
>> discussion when the tool approaches full production and given to the
>> community. It is part of what the community can or will need to do.
>>
>> All that said, page views as a raw number should not be the
>> determinator of a suggestion. I will add fuller comment to the
>> phabricator ticket.
>>
>>
> They arn't, and i hope noone was led to believe this was ever the
> intention. Page views is a factor. Currently the number of incoming
> wikilinks, outgoing wikilinks, external links, redirects, headings and the
> size of the article all have different weights. Page views is being added
> as another factor, the current WIP patch uses page views as ~23% of the
> final score (if my math is right).
>
> Regards, Billinghurst
>>
>> On Tue, Jan 26, 2016 at 9:37 AM, Dan Garry <[email protected]> wrote:
>> > Hey David,
>> >
>> > Thanks for starting this discussion!
>> >
>> > On 22 January 2016 at 13:53, David Causse <[email protected]>
>> wrote:
>> >>
>> >> http://en-suggesty.wmflabs.org/suggest.html is updated with a score
>> that
>> >> integrates pageviews.
>> >>
>> >> Pageviews solve most of the problems we encountered in the previous
>> >> formula unfortunately we now see some porn related suggestions.
>> >> - x will suggest xxx
>> >> - po will suggest pornhub just below poland in 2nd position. And is
>> ranked
>> >> #6 for the query 'p'
>> >
>> >
>> > As of right now, neither of these queries do this any more. "x" now
>> suggests
>> > "Xinjiang" as the top result, and "po" now suggests "Pope Francis" after
>> > "Poland"... which may or may not be more palatable than Pornhub,
>> depending
>> > on your viewpoints and ideals! Generally, Wikipedians like to point out
>> that
>> > Wikipedia is not censored. That said, it's still worth considering
>> whether
>> > this is appropriate or not. I personally don't have much of a problem
>> with
>> > the fact that certain search results might be a little offensive... but
>> I do
>> > think that they're probably also not really that useful.
>> >
>> > Given how volatile this has made our search results, my sense is that
>> we're
>> > giving too much weight to how much we're letting page view data affect
>> the
>> > ranking. Is it as simple as tweaking a coefficient so that page views
>> are
>> > still taken into consideration but with lower weight, or do we need to
>> do
>> > something more involved? I created T124722 to track this work, and
>> added it
>> > our list of blockers for a wider rollout of the suggester.
>> >
>> > Thanks!
>> >
>> > Dan
>> >
>> > --
>> > Dan Garry
>> > Lead Product Manager, Discovery
>> > Wikimedia Foundation
>> >
>> > _______________________________________________
>> > discovery mailing list
>> > [email protected]
>> > https://lists.wikimedia.org/mailman/listinfo/discovery
>> >
>>
>> _______________________________________________
>> discovery mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/discovery
>>
>
>
> _______________________________________________
> discovery mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/discovery
>
>
_______________________________________________
discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Reply via email to