BTW, as i mentioned, the machine learning

On Monday, May 4, 2015, J. Delgado <[email protected]> wrote:

> I totally agree that it depends at the task at hand and the amount/quality
> of the data that you can get hold of.
>
> The problem of relevancy in traditional document/semantic information
> retrieval (IR) task is such a hard thing because there is little or no
> source of truth you could use as training data (unless you you something
> like TREC for a limited set of documents to evaluate) in most cases.
> Additionally the feedback data you get from users, if it exists, is very
> noisy. It this case prior knowledge, encoded as attributes-weights, crafted
> functions, and heuristics is your best bet. You can however mine the
> content itself by leveraging clustering/topic modeling via LDA which is
> unsupervised learning algorithm and use that as input. Or perhaps
> Labeled-LDA and Multi-Grain LDA, another topic model for classification and
> sentiment analysis, which are supervised algorithms, in which case you can
> still use the approach I suggested.
>
> However, for search tasks that involve e-commerce, advertisements,
> recommendations, etc., there seems to be more data that can be captured
> from users interactions with the system/site, that can be used as signals
> and users' actions (adding things to wish lists, clicks for more info,
> conversions, etc.) is much more telling about the intention/values the user
> give to what is presented to them. Then viewing search as a machine
> learning/multi-objective optimization problem makes sense.
>
> My point is that search engines nowadays is used for all these use cases,
> thus it is worth exploring all the venues exposed in this thread.
>
> Cheers,
>
> -- Joaquin
>
> On Mon, May 4, 2015 at 2:31 PM, Tom Burton-West <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>> Hi Doug and Joaquin,
>>
>> This is a really interesting discussion.  Joaquin, I'm looking forward to
>> taking your code for a test drive.  Thank you for making it publicly
>> available.
>>
>> Doug,  I'm interested in your pyramid observation.  I work with academic
>> search which has some of the problems unique queries/information needs and
>> of data sparsity you mention in your blog post.
>>
>> This article makes a similar argument that massive amounts of user data
>> are so important for modern search engines that it is essentially a barrier
>> to entry for new web search engines.
>> Usage Data in Web Search: Benefits and Limitations. Ricardo Baeza-Yates and
>> Yoelle Maarek.  In Proceedings of SSDBM'2012, Chania, Crete, June 2012.
>> http://www.springerlink.com/index/58255K40151U036N.pdf
>>
>>  Tom
>>
>>
>>> I noticed that information retrieval problems fall into a sort-of
>>> layered pyramid. At the topmopst point is someone like Google where the
>>> sheer amount of high quality user behavior data that search truly is a
>>> machine learning problem, much as you propose. As you move down the pyramid
>>> the quality of user data diminishes.
>>>
>>> Eventually you get to a very thick layer of middle-class search
>>> applications that value relevance, but have very modest amounts or no user
>>> data. For most of them, even if they tracked their searches over a year,
>>> they *might* get good data over their top 50 searches. (I know cause they
>>> send me the spreadsheet and say fix it!). The best they can use analytics
>>> data is after-action troubleshooting. Actual user emails complaining about
>>> the search can be more useful than behavior data!
>>>
>>>
>>>
>

Reply via email to