Yes! The "Tell me why your search sucks" sign was a success last year, and I'm looking forward to seeing/hearing all the cool questions folks will ask this time! :)
I also just got a chance to look at Erik's slides (final version <https://upload.wikimedia.org/wikipedia/commons/4/4c/From_Clicks_to_Models_The_Wikimedia_LTR_Pipeline.pdf>) that he presented at Haystack and I think it might be cool to reprise that presentation in a breakout session...if Erik is up for it. :) Cheers, Deb -- deb tankersley Program Manager, Engineering Wikimedia Foundation On Wed, May 2, 2018 at 1:06 PM, Trey Jones <[email protected]> wrote: > Deb—We talked about some of these in our Wednesday meeting, but didn't do > much deciding or prioritizing. After that, at the hackathon travel meeting, > Rachel reminded us that the hackathon is "a community-focused event" and > that we as WMF staff should be "supporting, connecting, and helping > volunteer and affiliate developers." So, I think I'm going to update my > hackathon participation info to include a link to the list of projects I > want to work on, and hope that someone from outside the WMF contacts me > about something. On the learning side, I've already gotten David to agree > to help me with some of the technical bits I need for my some of my > proposed projects, either before or at the hackathon (yay!). I also hope > that the "Tell me why your search sucks" sign will encourage people to stop > and chat with us. I figure random people chatting with us about search and > anyone who wants to work with us would take precedence over any other > projects we might prefer to work on at the hackathon, though I plan to fall > back to my list if I run out of other things to do or people to talk to. > > Justin—We can definitely talk about ways to keep improving the ML ranking > (or other ML approaches for search). I don't know if there's time during > the hackathon to pull something together—I guess it depends on how complex > it is. More broadly—and Erik can speak more definitively about this—I'd say > while there's always some ML-related stuff going on in the background, our Q4 > goals > <https://www.mediawiki.org/wiki/Wikimedia_Technology/Goals/2017-18_Q4#Program_1:_Make_knowledge_more_easily_discoverable> > are > less about Learn-to-Rank/ML, so there may not be much bandwidth for any > complex projects in the short term. That said, I'm gathering ideas for NLP > applications for search—which often overlaps with ML applications—so if you > have any ideas (or if anyone else does!), please share them, whether here > or off-list. > > —Trey > > > Trey Jones > Sr. Software Engineer, Search Platform > Wikimedia Foundation > > On Wed, May 2, 2018 at 1:09 PM, Justin Ormont <[email protected]> > wrote: > >> Greetings Deb/Trey/Erik, >> >> I'd enjoy joining the discussions on these hackathon topics also. >> >> Specifically, I'd like to see I can help improve MWF's search relevance >> using additional machine learning techniques/ML-packages. >> >> Thanks, >> --justin >> >> On Wed, May 2, 2018 at 8:53 AM, Deborah Tankersley < >> [email protected]> wrote: >> >>> Nice stuff! >>> >>> Should we set up a meeting to talk more in depth about this, as we're >>> about 2 weeks out from the Hackathon right now? >>> >>> Cheers, >>> >>> Deb >>> >>> -- >>> >>> deb tankersley >>> >>> Program Manager, Engineering >>> >>> Wikimedia Foundation >>> >>> On Wed, May 2, 2018 at 8:39 AM, Trey Jones <[email protected]> wrote: >>> >>>> I've got my own list of more language-focused not-necessarily-great >>>> ideas, in order of my current desire to work on them: >>>> >>>> - Mirandese (mwl) analysis plugin built from Portuguese and French >>>> parts, plus a stop list provided by an mwl editor >>>> - plugin to merge high surrogates and low surrogates that get split >>>> up by the Chinese analyzer >>>> - plugin to do automatic homoglyph corrections >>>> - plugin to do transliteration for languages where it is relatively >>>> easy (Serbian was on the list, but it’s already done!—and for very >>>> simple >>>> mappings this is just a char map) >>>> - look into ways of automatically generating a stemmer from >>>> Wiktionary conjugation/declension data (maybe start with Estonian?) >>>> - compare the analyzers for the top 5-10 wiki languages by volume, >>>> and look for ways to increase consistency among them >>>> - develop a different statistical approach to detect wrong keyboard >>>> typing and build a search-only filter to generate alternative tokens—for >>>> Russian/English, Hebrew/English, OR one hand on wrong home row >>>> - update RelForge with some additional metrics I’ve been collecting >>>> - project Wordnet or other thesaurus/ontology onto short strings >>>> (e.g., Commons descriptions, Wikipedia titles, etc.) to determine useful >>>> thesaurus terms and prune the rest >>>> - recheck differences in unpacked vs monolithic analyzers >>>> (eliminating our automatic upgrades, which 98% likely to have caused the >>>> diffs) >>>> - “Bollywood detector”—identify and map Bollywood movie names into >>>> multiple scripts >>>> >>>> I was planning to work on the Mirandese analysis plugin and maybe one >>>> of the next three on the list. But if anyone wants to collaborate on any of >>>> the others, I'm happy to do so. >>>> >>>> Trey Jones >>>> Sr. Software Engineer, Search Platform >>>> Wikimedia Foundation >>>> >>>> On Tue, May 1, 2018 at 6:14 PM, Erik Bernhardson < >>>> [email protected]> wrote: >>>> >>>>> With the hackathon coming up I thought we could ponder what could be >>>>> done while there. I've been constructing a list of horrible ideas over the >>>>> last couple weeks: >>>>> >>>>> >>>> _______________________________________________ >>>> Discovery mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/discovery >>>> >>>> >>> >>> _______________________________________________ >>> Discovery mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/discovery >>> >>> >> >> _______________________________________________ >> Discovery mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/discovery >> >> > > _______________________________________________ > Discovery mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/discovery > >
_______________________________________________ Discovery mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/discovery
