I spent today looking at identifying and converting queries typed on the
wrong keyboard on the English and Russian Wikipedias.

*Highlights*

Looking for mis-keyboarded queries in the "right" character set (ie., Latin
on English Wikipedia or Cyrillic on Russian Wikipedia) can explain some
gibberish queries and give some improvement in results, but it's very
expensive because there are so many candidate queries.

Looking for mis-keyboarded queries in the "wrong" character set (ie.,
Cyrillic on English Wikipedia or Latin on Russian Wikipedia) can explain a
lot of gibberish queries and give better results, especially on Russian
Wikipedia, where possibly more than 1% of queries are accidentally typed on
the wrong keyboard!

Limiting the scope to only zero-result queries or perhaps poorly performing
(fewer than three results) queries could be computationally less expensive
and much more effective!

More details are available
<https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Typing_on_the_Wrong_Keyboard_Russian_and_English>
.

—Trey

Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
_______________________________________________
discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Reply via email to