Hello, happy to join the discussion. I also think that a search by phonetic is a really good improvement, currently many times you search on google and then copy paste.
I am also experimenting with elastic search, and thanks to this thread I discovered wikipedia is also using it with CirrusSearch; could search by applied only to *links names* (no text) of currently not phonetically supported languages, and then map results on ES? e.g. for chinese https://pypi.python.org/pypi/dragonmapper Maybe also ES has their own support ? On Tue, Jan 26, 2016 at 8:30 AM, Erik Bernhardson < [email protected]> wrote: > On Mon, Jan 25, 2016 at 11:16 PM, billinghurst <[email protected] > > wrote: > >> For the purpose of this exercise I think that it is completely >> reasonable for staff/developers to play with the factors and make sure >> that we are not having offence caused through this development. We >> want the focus to be on the tool, and what it can do; not start a >> bunfight and detract from the goal. >> >> For full production, I do NOT think that it is reasonable that either >> staff or developers make the determination of what is or what is not >> offensive, and whether a term should or should not be displayed. That >> determination sits clearly with the community, and is part of a >> discussion when the tool approaches full production and given to the >> community. It is part of what the community can or will need to do. >> >> All that said, page views as a raw number should not be the >> determinator of a suggestion. I will add fuller comment to the >> phabricator ticket. >> >> > They arn't, and i hope noone was led to believe this was ever the > intention. Page views is a factor. Currently the number of incoming > wikilinks, outgoing wikilinks, external links, redirects, headings and the > size of the article all have different weights. Page views is being added > as another factor, the current WIP patch uses page views as ~23% of the > final score (if my math is right). > > Regards, Billinghurst >> >> On Tue, Jan 26, 2016 at 9:37 AM, Dan Garry <[email protected]> wrote: >> > Hey David, >> > >> > Thanks for starting this discussion! >> > >> > On 22 January 2016 at 13:53, David Causse <[email protected]> >> wrote: >> >> >> >> http://en-suggesty.wmflabs.org/suggest.html is updated with a score >> that >> >> integrates pageviews. >> >> >> >> Pageviews solve most of the problems we encountered in the previous >> >> formula unfortunately we now see some porn related suggestions. >> >> - x will suggest xxx >> >> - po will suggest pornhub just below poland in 2nd position. And is >> ranked >> >> #6 for the query 'p' >> > >> > >> > As of right now, neither of these queries do this any more. "x" now >> suggests >> > "Xinjiang" as the top result, and "po" now suggests "Pope Francis" after >> > "Poland"... which may or may not be more palatable than Pornhub, >> depending >> > on your viewpoints and ideals! Generally, Wikipedians like to point out >> that >> > Wikipedia is not censored. That said, it's still worth considering >> whether >> > this is appropriate or not. I personally don't have much of a problem >> with >> > the fact that certain search results might be a little offensive... but >> I do >> > think that they're probably also not really that useful. >> > >> > Given how volatile this has made our search results, my sense is that >> we're >> > giving too much weight to how much we're letting page view data affect >> the >> > ranking. Is it as simple as tweaking a coefficient so that page views >> are >> > still taken into consideration but with lower weight, or do we need to >> do >> > something more involved? I created T124722 to track this work, and >> added it >> > our list of blockers for a wider rollout of the suggester. >> > >> > Thanks! >> > >> > Dan >> > >> > -- >> > Dan Garry >> > Lead Product Manager, Discovery >> > Wikimedia Foundation >> > >> > _______________________________________________ >> > discovery mailing list >> > [email protected] >> > https://lists.wikimedia.org/mailman/listinfo/discovery >> > >> >> _______________________________________________ >> discovery mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/discovery >> > > > _______________________________________________ > discovery mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/discovery > >
_______________________________________________ discovery mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/discovery
