Yes! The "Tell me why your search sucks" sign was a success last year, and
I'm looking forward to seeing/hearing all the cool questions folks will ask
this time! :)

I also just got a chance to look at Erik's slides (final version
<https://upload.wikimedia.org/wikipedia/commons/4/4c/From_Clicks_to_Models_The_Wikimedia_LTR_Pipeline.pdf>)
that he presented at Haystack and I think it might be cool to reprise that
presentation in a breakout session...if Erik is up for it. :)

Cheers,

Deb

--

deb tankersley

Program Manager, Engineering

Wikimedia Foundation

On Wed, May 2, 2018 at 1:06 PM, Trey Jones <[email protected]> wrote:

> Deb—We talked about some of these in our Wednesday meeting, but didn't do
> much deciding or prioritizing. After that, at the hackathon travel meeting,
> Rachel reminded us that the hackathon is "a community-focused event" and
> that we as WMF staff should be "supporting, connecting, and helping
> volunteer and affiliate developers." So, I think I'm going to update my
> hackathon participation info to include a link to the list of projects I
> want to work on, and hope that someone from outside the WMF contacts me
> about something. On the learning side, I've already gotten David to agree
> to help me with some of the technical bits I need for my some of my
> proposed projects, either before or at the hackathon (yay!). I also hope
> that the "Tell me why your search sucks" sign will encourage people to stop
> and chat with us. I figure random people chatting with us about search and
> anyone who wants to work with us would take precedence over any other
> projects we might prefer to work on at the hackathon, though I plan to fall
> back to my list if I run out of other things to do or people to talk to.
>
> Justin—We can definitely talk about ways to keep improving the ML ranking
> (or other ML approaches for search). I don't know if there's time during
> the hackathon to pull something together—I guess it depends on how complex
> it is. More broadly—and Erik can speak more definitively about this—I'd say
> while there's always some ML-related stuff going on in the background, our Q4
> goals
> <https://www.mediawiki.org/wiki/Wikimedia_Technology/Goals/2017-18_Q4#Program_1:_Make_knowledge_more_easily_discoverable>
>  are
> less about Learn-to-Rank/ML, so there may not be much bandwidth for any
> complex projects in the short term. That said, I'm gathering ideas for NLP
> applications for search—which often overlaps with ML applications—so if you
> have any ideas (or if anyone else does!), please share them, whether here
> or off-list.
>
> —Trey
>
>
> Trey Jones
> Sr. Software Engineer, Search Platform
> Wikimedia Foundation
>
> On Wed, May 2, 2018 at 1:09 PM, Justin Ormont <[email protected]>
> wrote:
>
>> Greetings Deb/Trey/Erik,
>>
>> I'd enjoy joining the discussions on these hackathon topics also.
>>
>> Specifically, I'd like to see I can help improve MWF's search relevance
>> using additional machine learning techniques/ML-packages.
>>
>> Thanks,
>> --justin
>>
>> On Wed, May 2, 2018 at 8:53 AM, Deborah Tankersley <
>> [email protected]> wrote:
>>
>>> Nice stuff!
>>>
>>> Should we set up a meeting to talk more in depth about this, as we're
>>> about 2 weeks out from the Hackathon right now?
>>>
>>> Cheers,
>>>
>>> Deb
>>>
>>> --
>>>
>>> deb tankersley
>>>
>>> Program Manager, Engineering
>>>
>>> Wikimedia Foundation
>>>
>>> On Wed, May 2, 2018 at 8:39 AM, Trey Jones <[email protected]> wrote:
>>>
>>>> I've got my own list of more language-focused not-necessarily-great
>>>> ideas, in order of my current desire to work on them:
>>>>
>>>>    - Mirandese (mwl) analysis plugin built from Portuguese and French
>>>>    parts, plus a stop list provided by an mwl editor
>>>>    - plugin to merge high surrogates and low surrogates that get split
>>>>    up by the Chinese analyzer
>>>>    - plugin to do automatic homoglyph corrections
>>>>    - plugin to do transliteration for languages where it is relatively
>>>>    easy (Serbian was on the list, but it’s already done!—and for very 
>>>> simple
>>>>    mappings this is just a char map)
>>>>    - look into ways of automatically generating a stemmer from
>>>>    Wiktionary conjugation/declension data (maybe start with Estonian?)
>>>>    - compare the analyzers for the top 5-10 wiki languages by volume,
>>>>    and look for ways to increase consistency among them
>>>>    - develop a different statistical approach to detect wrong keyboard
>>>>    typing and build a search-only filter to generate alternative tokens—for
>>>>    Russian/English, Hebrew/English, OR one hand on wrong home row
>>>>    - update RelForge with some additional metrics I’ve been collecting
>>>>    - project Wordnet or other thesaurus/ontology onto short strings
>>>>    (e.g., Commons descriptions, Wikipedia titles, etc.) to determine useful
>>>>    thesaurus terms and prune the rest
>>>>    - recheck differences in unpacked vs monolithic analyzers
>>>>    (eliminating our automatic upgrades, which 98% likely to have caused the
>>>>    diffs)
>>>>    - “Bollywood detector”—identify and map Bollywood movie names into
>>>>    multiple scripts
>>>>
>>>> I was planning to work on the Mirandese analysis plugin and maybe one
>>>> of the next three on the list. But if anyone wants to collaborate on any of
>>>> the others, I'm happy to do so.
>>>>
>>>> Trey Jones
>>>> Sr. Software Engineer, Search Platform
>>>> Wikimedia Foundation
>>>>
>>>> On Tue, May 1, 2018 at 6:14 PM, Erik Bernhardson <
>>>> [email protected]> wrote:
>>>>
>>>>> With the hackathon coming up I thought we could ponder what could be
>>>>> done while there. I've been constructing a list of horrible ideas over the
>>>>> last couple weeks:
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Discovery mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/discovery
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Discovery mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/discovery
>>>
>>>
>>
>> _______________________________________________
>> Discovery mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/discovery
>>
>>
>
> _______________________________________________
> Discovery mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/discovery
>
>
_______________________________________________
Discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Reply via email to