Nice examples! Though I think I may have been unclear; it's not that the
desired results can't be on page two (or even much further along), but
rather that I expect people to re-examine their query (or just give up,
alas) if they don't see any good results on the first page.

With "their red hot", hopefully they'll notice their typo and correct it.
For "Cities in the San Francisco Bay Area", I don't think slogging through
to result 122 is likely for most users.

CirrusSearch certainly could give better results in these cases—and we
should look into it if these kinds of failed queries are sufficiently
common. And some of our general improvements may help these specific
problems. I have a hunch that the "boosted all field" task we're working on
may have a positive effect on the cities in SF problem, for example.


On Wed, Feb 10, 2016 at 3:33 PM, Justin Ormont <[email protected]>
wrote:

> Good hits on page two:
>
> There's a few cases where good results could exist only on page two.
>
> One case is when incorrectly searching for a homophone or other
> misspelling. Eg: "their red hot" instead of "they're red hot" (expected
> result <https://en.wikipedia.org/wiki/They%27re_Red_Hot> -- wikipedia
> <https://en.wikipedia.org/w/index.php?search=their+red+hot&title=Special%3ASearch>
>  (pos
> 22), google
> <https://www.google.com/search?q=their+red+hot&oq=their+red+hot> (pos 1),
> bing <https://www.bing.com/search?q=their+red+hot> (pos 2), ddg
> <https://duckduckgo.com/?q=their+red+hot> (pos 2)).
>
> Another case is when you get an exact string match on incorrect pages, but
> only non-exact string match on the correct page. Eg: "Cities in the San
> Francisco Bay Area" (expected result
> <https://en.wikipedia.org/wiki/List_of_cities_and_towns_in_the_San_Francisco_Bay_Area>
> -- wikipedia
> <https://en.wikipedia.org/w/index.php?title=Special:Search&search=Cities+in+the+San+Francisco+Bay+Area>
>  (pos
> 122), google
> <https://www.google.com/search?q=Cities+in+the+San+Francisco+Bay+Area> (pos
> 1), bing
> <https://www.bing.com/search?q=Cities+in+the+San+Francisco+Bay+Area> (pos
> 1), ddg <https://duckduckgo.com/?q=Cities+in+the+San+Francisco+Bay+Area> (pos
> 1)).
>
> This style occurs mostly for a navigation query (only one correct result).
> For explorative queries, odds are one of the relevant results will be on
> page 1.
>
> There's a couple less direct cases, for instance if/once you integrate a
> popularity score, freshness score, importance score, page query score, or
> personalization (eg. ranking by physical distance from user or user's
> interests), you'll find some examples where incorrect results are
> non-helpfully boosted.
>
> Investigating queries which lead to clicks on page two may find
> interesting things popping out.
>
> --
>
> Knowing the SAT/DSAT-click-rate-vs.-position will tell you if good clicks
> often occur beyond position 10. Then running an experiment of 10 SERP
> results vs. 20 SERP results may give interesting insights when watching a
> session-success-rate metric (and maybe a time-to-success metric). Aka,
> checking if a click on position 11+ is almost ever useful, or just leads to
> a requery or abandonment. If you run result size experiments, you can
> normalize for the query latency effects by generating 20 and displaying 10.
>
> The need of scrolling can cause a faster fall off of the click rates
> listed. On my web browser, as it's currently sized, there are only three
> results above the fold (my open advanced facet block takes a lot of space,
> scrolling required for result 4+). Knowing how-much/if the click rate drops
> for results below the fold will also help optimize the number of results to
> display, snippet length, and UI design. Could instrument number of results
> above the fold.
>
> --
>
> Side note: possible bug, I can't find the page "List of New York
> University alumni
> <https://en.wikipedia.org/wiki/List_of_New_York_University_alumni>" when
> querying "New York University alumni
> <https://en.wikipedia.org/w/index.php?search=New+York+University+alumni&title=Special%3ASearch&go=Go>"
> (screenshot <https://imgur.com/SymW9tv>).
>
> --justin
>
> On Wed, Feb 10, 2016 at 12:04 PM, billinghurst <[email protected]
> > wrote:
>
>> It would also be interesting to know the type of page that becomes their
>> destination
>>
>> Person
>> Object
>> News
>> Concept
>> Etc.
>>
>> Some are easier to describe and predict, aligns with a search length too.
>>
>> -- billinghurst
>>
>>
>> On Wed, 10 Feb 2016 22:01 Justin Ormont <[email protected]> wrote:
>>
>>> This is great. Do you have any categories tracked that could be
>>> interesting to break the position click-rates down by? eg: navigational vs.
>>> explorative queries, SAT clicks (satisfied user's query intent) vs. DSAT
>>> clicks (not satisfied), requery rate (how many times a user reformulates a
>>> new query in a session), time-to-first-click, search session duration,
>>> user's country/default language/# edits, length of query (# of query
>>> tokens), # of query results, popular vs. uncommon queries, high scoring
>>> SERP vs. low scoring SERP (or a proxy like Max BM25F of the top result),
>>> speller was click vs. not clicked, category of page clicked on, popular
>>> pages vs. rarely visited pages, etc.
>>>
>>> This experiment running on Special:Search is nice as that page doesn't
>>> automatically redirect when the query exactly matches a page.
>>>
>>> You can measure the positional importance by setting up an A/B test
>>> where you flip position 2 & 3. Also, a slowdown experiment would tell you
>>> the impact of latency, and help focus engineering efforts towards
>>> precision, or latency improvements.
>>>
>>> --justin
>>>
>>>
>>> On Tue, Feb 9, 2016 at 9:53 AM, Erik Bernhardson <
>>> [email protected]> wrote:
>>>
>>>> 2 |  34214 | 14.26%
>>>
>>>
>>>
>>> _______________________________________________
>>> discovery mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/discovery
>>>
>>
>> _______________________________________________
>> discovery mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/discovery
>>
>>
>
> _______________________________________________
> discovery mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/discovery
>
>
_______________________________________________
discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Reply via email to