Heya, Another week, another update from the Search Platform team. This one is for the week starting 2018-04-09.
As always, feedback and questions welcome. == Discussions == === Search === * New search code fully deployed & enabled on Wikidata. ==Events and News == * Erik and Trey went to the "OpenSource Connections Haystack Search Relevance Conference" and "Tom Tom Founders Festival Machine Learning Conference", which were back-to-back in Charlottesville, VA. Erik presented on how we use clickstream information to create training data for our learning to rank models at Haystack. [1] Trey wrote up trip notes—with lots of links—on MediaWiki. [2] == Other Noteworthy Stuff == * Fix for CirrusSearchCheckerJob errors rolled out. [3] * Stas implemented indexing Lexemes & Forms for WikibaseLexeme extension. [4] == Did you know? == *The English verb "to be" is kind of weird—the infinitive "be" and participles "being, been" start with "b-", while the preterite forms "was, were" start with "w-", and the present forms "am, is, are" start with vowels. The conjugations originally come from three or four different verbs! Why "three or four"? Wiktionary disagrees with itself a bit, listing four on the etymology of "is" [5] and three on the etymology of "be". [6] The conflation goes back at least to Proto-Germanic, [7] so German is similarly weird. [8] Dutch has a greatly simplified paradigm, but still shows some trace of the multiple sources. [9] Other languages, including ASL, Arabic, Bengali, Hawaiian, Hebrew, Indonesian, Japanese, Russian, Turkish, and Ukrainian at least partly avoid this mess by having a zero copula. [10] For search on-wiki, we deal with this problem in part with stemming [11] and stop words. [12] [0] https://www.wikidata.org/wiki/Wikidata:Project_chat#Improvements_on_the_search_results [1] https://commons.wikimedia.org/wiki/File:From_Clicks_to_Models_The_Wikimedia_LTR_Pipeline.pdf [2] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/April_2018_Conference_Trip_Report [3] https://phabricator.wikimedia.org/T190958 [4] https://phabricator.wikimedia.org/T189745 [5] https://en.wiktionary.org/wiki/is#Etymology_1 [6] https://en.wiktionary.org/wiki/be#Etymology [7] https://en.wikipedia.org/wiki/Proto-Germanic_language [8] https://en.wiktionary.org/wiki/sein#Conjugation [9] https://en.wiktionary.org/wiki/zijn#Inflection [10] https://en.wikipedia.org/wiki/Zero_copula [11] https://en.wikipedia.org/wiki/Stemming [12] https://en.wikipedia.org/wiki/Stop_words --- Subscribe to receive on-wiki (or opt-in email) notifications of the Discovery weekly update. https://www.mediawiki.org/wiki/Newsletter:Discovery_Weekly The archive of all past updates can be found on MediaWiki.org: https://www.mediawiki.org/wiki/Discovery/Status_updates Interested in getting involved? See tasks marked as "Easy" or "Volunteer needed" in Phabricator. [1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R [2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R Yours, Chris Koerner Community Liaison Wikimedia Foundation _______________________________________________ Discovery mailing list Discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery