[
https://issues.apache.org/jira/browse/SOLR-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865107#comment-17865107
]
Eric Pugh commented on SOLR-10359:
----------------------------------
I wanted to share some work that I've been doing in this space. As part of
another project, I've been able to contribute to a standard we are calling
"User Behavior Interactions" for tracking what users are doing. This standard,
which is NOT tied to any specific search engine, like Solr, is documented at
[https://github.com/o19s/ubi.] There is a draft PR for implementing UBI for
Solr here: [https://github.com/apache/solr/pull/2452]
I have hopes that in the latter half of 2024, we'll be publishing some jupyter
notebook style demonstration code for taking UBI based data and producing
implicit judgements from that data ;).
> User Interactions Logging Module
> --------------------------------
>
> Key: SOLR-10359
> URL: https://issues.apache.org/jira/browse/SOLR-10359
> Project: Solr
> Issue Type: New Feature
> Reporter: Alessandro Benedetti
> Priority: Major
> Labels: CTR, evaluation
>
> *Introduction*
> Being able to evaluate the quality of your search engine is becoming more and
> more important day by day.
> This issue is to put a milestone to integrate online evaluation metrics with
> Solr.
> *Scope*
> Scope of this issue is to provide a set of components able to :
> 1) Collect Search Results impressions ( results shown per query)
> 2) Collect Users interactions ( user interactions on the search results per
> query e.g. clicks, bookmarking,ect )
> 3) Calculate evaluation metrics on demand, such as Click Through Rate, DCG ...
> *Technical Design*
> A SearchComponent can be designed :
> *UsersEventsLoggerComponent*
> A property (such as storeDir) will define where the data collected will be
> stored.
> Different data structures can be explored, to keep it simple, a first
> implementation can be a Lucene Index.
> *Data Model*
> The user event can be modelled in the following way :
> <query> - the user query the event is related to
> <result_id> - the ID of the search result involved in the interaction
> <result_position> - the position in the ranking of the search result involved
> in the interaction
> <timestamp> - time when the interaction happened
> <relevancy_rating> - 0 for impressions, a value between 1-5 to identify the
> type of user event, the semantic will depend on the domain and use cases
> <test_group> - this can identify a variant, in A/B testing
> *Impressions Logging*
> When the SearchComponent is assigned to a request handler, everytime it
> processes a request and return to the user a result set for a query, the
> component will collect the impressions ( results returned) and index them in
> the auxiliary lucene index.
> This will happen in parallel as soon as you return the results to avoid
> affecting the query time.
> Of course an impact on CPU load and memory is expected, will be interesting
> to minimise it.
> *User Events Logging*
> An UpdateHandler will be exposed to accept POST requests and collect user
> events.
> Everytime a request is sent, the user event will be indexed in the underline
> auxiliary Lucene Index.
> *Stats Calculation*
> A RequestHandler will be exposed to be able to calculate stats and
> aggregations for the metrics :
> /evaluation?metric=ctr&stats=query&compare=testA,testB
> This request could calculate the CTR for our testA and testB to compare.
> Showing stats in total and per query ( to highlight the queries with
> lower/higher CTR).
> The calculations will happen separating the <test_group> for an easy
> comparison.
> Will be important to keep it as simple as possible for a first version, to
> then extend it as much as we like
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]