[
https://issues.apache.org/jira/browse/SOLR-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945944#comment-15945944
]
Michael Nilsson commented on SOLR-10359:
----------------------------------------
The ideas in this ticket are definitely something everyone encounters when
needing to evaluate how good their search is performing. I think the scope of
this enhancement, for a first cut, could be narrowed down a bit though.
1) If you are storing the user interactions + impressions in a parallel solr
collection, you don't need a separate evaluation component initially. You
could use Solr JSON faceting, the analytics component, or streaming joins
(which can work on databases too) to calculate the numbers instead. The first
cut could probably just provide documentation for the exact requests to send in
order to calculate CTR, etc.
2) Also, you probably won't want to auto-log results returned from Solr as the
impressions at first. As mentioned above, results returned from Solr are not
always 1 to 1 with results displayed. Just like you will be providing a way to
store user interactions on demand via an endpoint, you should probably just
expand that to allow storing user impressions on demand as well.
3) You will need a way to link the user impressions with their interactions.
You could supply a unique search id with the initial result set and let the
client pass that back to you when sending the save impressions request and save
interactions request. However, for the first cut you could make it the
client's responsibility of generating the unique id to then pass back to you.
For the use cases of Solr that use a federated search across multiple
collections and merge the results into 1 list, points 2 and 3 become more
important. I might query 10 results from each of 3 collections, for a total of
30 results, but only display the top 5 combined on my page. If solr auto
generates a search id, I will now have 3 ids instead of 1. Also, there were
only 5 total impressions, not 30 for the auto logging case.
> User Interactions Logger Component
> ----------------------------------
>
> Key: SOLR-10359
> URL: https://issues.apache.org/jira/browse/SOLR-10359
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Alessandro Benedetti
> Labels: CTR, evaluation
>
> *Introduction*
> Being able to evaluate the quality of your search engine is becoming more and
> more important day by day.
> This issue is to put a milestone to integrate online evaluation metrics with
> Solr.
> *Scope*
> Scope of this issue is to provide a set of components able to :
> 1) Collect Search Results impressions ( results shown per query)
> 2) Collect Users interactions ( user interactions on the search results per
> query e.g. clicks, bookmarking,ect )
> 3) Calculate evaluation metrics on demand, such as Click Through Rate, DCG ...
> *Technical Design*
> A SearchComponent can be designed :
> *UsersEventsLoggerComponent*
> A property (such as storeDir) will define where the data collected will be
> stored.
> Different data structures can be explored, to keep it simple, a first
> implementation can be a Lucene Index.
> *Data Model*
> The user event can be modelled in the following way :
> <query> - the user query the event is related to
> <result_id> - the ID of the search result involved in the interaction
> <result_position> - the position in the ranking of the search result involved
> in the interaction
> <timestamp> - time when the interaction happened
> <relevancy_rating> - 0 for impressions, a value between 1-5 to identify the
> type of user event, the semantic will depend on the domain and use cases
> <test_group> - this can identify a variant, in A/B testing
> *Impressions Logging*
> When the SearchComponent is assigned to a request handler, everytime it
> processes a request and return to the user a result set for a query, the
> component will collect the impressions ( results returned) and index them in
> the auxiliary lucene index.
> This will happen in parallel as soon as you return the results to avoid
> affecting the query time.
> Of course an impact on CPU load and memory is expected, will be interesting
> to minimise it.
> * User Events Logging *
> An UpdateHandler will be exposed to accept POST requests and collect user
> events.
> Everytime a request is sent, the user event will be indexed in the underline
> auxiliary Lucene Index.
> * Stats Calculation *
> A RequestHandler will be exposed to be able to calculate stats and
> aggregations for the metrics :
> /evaluation?metric=ctr&stats=query&compare=testA,testB
> This request could calculate the CTR for our testA and testB to compare.
> Showing stats in total and per query ( to highlight the queries with
> lower/higher CTR).
> The calculations will happen separating the <test_group> for an easy
> comparison.
> Will be important to keep it as simple as possible for a first version, to
> then extend it as much as we like
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]