Hi Doug and Joaquin,

This is a really interesting discussion.  Joaquin, I'm looking forward to
taking your code for a test drive.  Thank you for making it publicly
available.

Doug,  I'm interested in your pyramid observation.  I work with academic
search which has some of the problems unique queries/information needs and
of data sparsity you mention in your blog post.

This article makes a similar argument that massive amounts of user data are
so important for modern search engines that it is essentially a barrier to
entry for new web search engines.
Usage Data in Web Search: Benefits and Limitations. Ricardo Baeza-Yates and
Yoelle Maarek.  In Proceedings of SSDBM'2012, Chania, Crete, June 2012.
http://www.springerlink.com/index/58255K40151U036N.pdf

 Tom


> I noticed that information retrieval problems fall into a sort-of layered
> pyramid. At the topmopst point is someone like Google where the sheer
> amount of high quality user behavior data that search truly is a machine
> learning problem, much as you propose. As you move down the pyramid the
> quality of user data diminishes.
>
> Eventually you get to a very thick layer of middle-class search
> applications that value relevance, but have very modest amounts or no user
> data. For most of them, even if they tracked their searches over a year,
> they *might* get good data over their top 50 searches. (I know cause they
> send me the spreadsheet and say fix it!). The best they can use analytics
> data is after-action troubleshooting. Actual user emails complaining about
> the search can be more useful than behavior data!
>
>
>

Reply via email to