Here are summarized meeting notes: * No-go on default fuzzy searching for mobile apps so as to not hammer server - prefix search (title-starts-with) to be used; if approaching default fuzzy search as technology is refined, add 300ms delay to iOS like for Android, though * Possibly try this on beta mobile web, alpha mobile web, or a targeted language Wikipedia on the mobile web (maybe a larger beta mobile web language Wikipedia) to see how performance would go * Search team to examine returning fewer fields in each search result record by default when srprop mask not specified (e.g., don't return snippets unless they're requested)
-Adam On Wed, Apr 2, 2014 at 8:11 AM, Nikolas Everett <[email protected]>wrote: > > > > On Tue, Apr 1, 2014 at 7:19 PM, Adam Baso <[email protected]> wrote: > >> >> +mobile-l >> >> mobile-l recipients, if replying, if you would please reply-all in case >> any people on the CC: line aren't on mobile-l, it would be appreciated. >> >> Nik, >> >> Thanks for the update. Glad to hear there's even faster performance >> coming and also that there's no need to structure too much fallback stuff >> depending on whether the reflex time is okay. With any luck, it would be >> just fast enough. I don't think there'd be too much hammering on the >> suggest term; only if the resultset is insufficient does it seem like it >> would make sense to orchestrate the client side (or server side, for that >> matter) call. The apps do have a key tap timer thing on them to help avoid >> spurious searching, so that should help. I think I understand the ellipse >> related stuff - parsing the snippet text is no problem, but if there's an >> even simpler way to get text condensed to the point where there's no work >> to avoid wrapping on most form factors, cool!...and if I misunderstood, >> well, we'll get to the bottom of that on Friday. >> > > I suppose this is close to my heart because I just worked on it, but if > you chop the snippet on the client side it defeats the logic used to pick > the "best snippet". That logic isn't that great in Cirrus now but it'll > get a whole lot better when we deploy the new highlighter. Right now the > snippet is always 150 characters or something +/- 20ish characters on each > side to find a word break. We pick the best snippet based on hits in the > 150 character window. At minimum we should let you configure it to > something that'll fit better. I suppose the best option would be to > configure up font widths that matter and then use them to chop really > accurately. For the most part that sounds pretty simple and quick to > implement so long as we're ok with estimates that ignore stuff like > ligatures. > > Do the type timers fire the hit on the leading character or on a slight > hesitation? Prefix search on the site is leading character and then it > <cancels> the request if the user types more. That is silly because I > can't cancel the request on the backend.... If it triggers on a hesitation > then we should just plow ahead, I think. If it triggers on leading > characters then we should totally cache requests shorter then N character. > 3 or 5 or something. > > From the code: > // If we were constraining the namespace set, we would probably use > // 0|1|2|3|4|5|6|7|12|13|14|15|100|101|108|109|446|447 > // to keep it related more to article, article talk, policy, > // policy talk, help, and help talk types of resources. > // The odd numbered Talk pages could even be witheld, but that's > // sort of pointless when the number of backlinks to them is > // likely to be small, meaning they won't turn up too much > // unless they're a result(set) of last resort, or the user > // went to the trouble of prefix namespace searching such as > Talk:Cats. > // But realistically, it's probably easier to just stick to > // not defining a namespace constrainint set, and thereby (likely) > // getting more pre-cached responses, due to other consumers > leading > // or follwing suit. There's a school of thought, or there could > be, > // that says only namespace 0 should be searched here, as it's > // the core article content. But users may practically want > // categories, too. And such logic spirals out from there. > // If we were instead using the opensearch API and were seeking > // parity with the desktop and mobile web experience, we should > // indeed as of 27-March-2014 only be searching namespace 0. > // But as CirrusSearch will be the norm and server load is expected > // to handle things just fine (no fallback is necessary per Search > team), > // higher quality search results now can be obtained anyway. > > > Cirrus searches all wgContentNamespaces by default and it is optimized to > do so. All non-content namespaces are in another index so we don't have to > pay attention to it during the request. We also don't have to filter by > namespace at all. > > Each namespace has a weight factor that influences its position. That > factor often ends up being more important then links. Links are "score * > log(incoming_links + 2)" and the weights vary from "score * 1" (MAIN) to > "score * 0.0025" (TEMPLATE_TALK). Our power users expect these because > lsearchd did it. Mobile users, who knows. > > // With all of this considered, we want a request of the following > format > // // > en.m.wikipedia.org/w/api.php?action=query&list=search&srsearch=cats&srprop=snippet|sectiontitle&srlimit=15&srbackend=CirrusSearch&format=json<http://en.m.wikipedia.org/w/api.php?action=query&list=search&srsearch=cats&srprop=snippet%7Csectiontitle&srlimit=15&srbackend=CirrusSearch&format=json> > // Note that MobileFrontend's use of opensearch has its result > // set limited at 15. Note also that the 'srprop' only keeps > 'snippet', > // and 'sectiontitle' plus the 'title' field which is always > implicit. > // This buys us some additional features once we're ready for them, > // all the while populating the cache. > // We probably also will want to add 'srinterwiki=1' in some future > // state so that users don't have to change their language to > // search setting. As it is, 'srinterwiki' is not yet in place > // and the format of such results may look a bit different, > // so it's probably best to hold off on 'srinterwiki=1'. We are > // not yet using the snippets and section titles, but let's get the > // cache populated for our sake and everyone else's sake. > > Interwiki is coming but I'd give it a few months, I think. > > // NOTE: > // Although as of 27-March-2014 it seems that suggestions may not > be coming > // back for CirrusSearch as frequently as for Lucene, that's > probably > // just an artifact of relatively lower training of suggestions. > // In other words, it's likely that the suggestion pairing will > grow. > // Currently, we're not examining > [@"query"][@"searchinfo"][@"suggestion], > // but we could. There are two cases for the suggestion. > // 1. When the result set is of length 0, just fire off a search > with the suggestion. > // This is the case where the user probably mis-spelled > something. > // 2. When the result set is of short length (less than 5?), fire > another search with > // the suggestion, and then collate those search results > /after/ the first result set. > > The suggestion is actually better then you give it credit for: even if it > lots of results show up if we provide a suggestion it might useful. It > comes from redirect and title names and it'll suggest combinations that > work. So if the user searches for "picket's charge" it'll suggest > "pickett's charge" even though there are plenty of results for the first > term. The results for the second term are better. > > The reason you get different results is because the implementations are > vastly different. The Cirrus implementation has less tuning but is "more > modern". Whatever that is worth. > > Nik >
_______________________________________________ Mobile-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mobile-l
