Heya,

Another week, another update from the Search Platform team. This one
is for the week starting 2018-04-09.

As always, feedback and questions welcome.

== Discussions ==

=== Search ===
* New search code fully deployed & enabled on Wikidata.

==Events and News ==
* Erik and Trey went to the "OpenSource Connections Haystack Search
Relevance Conference" and "Tom Tom Founders Festival Machine Learning
Conference", which were back-to-back in Charlottesville, VA. Erik
presented on how we use clickstream information to create training
data for our learning to rank models at Haystack. [1] Trey wrote up
trip notes—with lots of links—on MediaWiki. [2]

== Other Noteworthy Stuff  ==
* Fix for CirrusSearchCheckerJob errors rolled out. [3]
* Stas implemented indexing Lexemes & Forms for WikibaseLexeme extension. [4]

== Did you know? ==
*The English verb "to be" is kind of weird—the infinitive "be" and
participles "being, been" start with "b-", while the preterite forms
"was, were" start with "w-", and the present forms "am, is, are" start
with vowels. The conjugations originally come from three or four
different verbs! Why "three or four"? Wiktionary disagrees with itself
a bit, listing four on the etymology of "is" [5] and three on the
etymology of "be". [6] The conflation goes back at least to
Proto-Germanic, [7] so German is similarly weird. [8] Dutch has a
greatly simplified paradigm, but still shows some trace of the
multiple sources. [9] Other languages, including ASL, Arabic, Bengali,
Hawaiian, Hebrew, Indonesian, Japanese, Russian, Turkish, and
Ukrainian at least partly avoid this mess by having a zero copula.
[10] For search on-wiki, we deal with this problem in part with
stemming [11] and stop words. [12]

[0] 
https://www.wikidata.org/wiki/Wikidata:Project_chat#Improvements_on_the_search_results
[1] 
https://commons.wikimedia.org/wiki/File:From_Clicks_to_Models_The_Wikimedia_LTR_Pipeline.pdf
[2] 
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/April_2018_Conference_Trip_Report
[3] https://phabricator.wikimedia.org/T190958
[4] https://phabricator.wikimedia.org/T189745
[5] https://en.wiktionary.org/wiki/is#Etymology_1
[6] https://en.wiktionary.org/wiki/be#Etymology
[7] https://en.wikipedia.org/wiki/Proto-Germanic_language
[8] https://en.wiktionary.org/wiki/sein#Conjugation
[9] https://en.wiktionary.org/wiki/zijn#Inflection
[10] https://en.wikipedia.org/wiki/Zero_copula
[11] https://en.wikipedia.org/wiki/Stemming
[12] https://en.wikipedia.org/wiki/Stop_words

---

Subscribe to receive on-wiki (or opt-in email) notifications of the
Discovery weekly update.

https://www.mediawiki.org/wiki/Newsletter:Discovery_Weekly

The archive of all past updates can be found on MediaWiki.org:

https://www.mediawiki.org/wiki/Discovery/Status_updates

Interested in getting involved? See tasks marked as "Easy" or
"Volunteer needed" in Phabricator.

[1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R
[2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R


Yours,
Chris Koerner
Community Liaison
Wikimedia Foundation

_______________________________________________
Discovery mailing list
Discovery@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery

Reply via email to