[ 
https://issues.apache.org/jira/browse/SOLR-10717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019678#comment-16019678
 ] 

Alessandro Benedetti commented on SOLR-10717:
---------------------------------------------

{quote}
we'd be turning a number into a possibly-missing-number
{quote}

This is actually an interesting point. And we need to go back to the training 
library ( and related algorithms) to make a comprehensive analysis.
At the moment, training a model with a sparse training vector ( where some 
feature values is missing in some feature vector) will vary from implementation 
to implementation.
If we introduce a "Nan missing feature value", I agree we would need to 
consider this in the models ( and consequentially the training algorithms must 
manage missing values coherently to be compatible).
To be consistent we should investigate how many training algorithms support 
missing values and how this is rendered in the models.
This could be a rough path, that of course could be simplified to consider Nan 
as zeros ( which will be incorrect but it is often the default for some 
libraries)

{quote}
3) Add a parameter (ignoreEfiErrors) that the user can set to true if s/he 
wants 1) otherwise 2).
{quote}

I agree we will add a little bit of configuration complexity, but probably we 
will gain some flexibility as well.
If i remember well we already allow users to set a default value for a value 
feature that takes an EFI in input.
We could extend this to allow the admin to configure a default also for other 
kinds of features.
So in case the EFI is missing the default value is assigned.
I think having a *requireEfi* configuration ( default to true) will imply :

requireEfi = true ( default)
1) take the EFI in input
2) if EFI is missing but default is defined, let's use the default
3) f EFI is missing and default is missing, return an error

requireEfi = false
1) take the EFI in input
2) if EFI is missing but default is defined, let's use the default
3) f EFI is missing and default is missing, assign 0 as a feature value ( or 
NaN if we manage to have a full stack training-model supporting NaN)


> Learning to rank: a query will fail if the feature vector is requested 
> without providing external feature information parameters
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-10717
>                 URL: https://issues.apache.org/jira/browse/SOLR-10717
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 6.5.1
>            Reporter: Diego Ceccarelli
>            Priority: Minor
>
> In ltr some features can depend on External Feature Informations that have to 
> be provided at query time. If we query solr only to retrieve the feature 
> vectors for the documents (without doing reranking), and without providing 
> all the external feature informations used in the feature store the query 
> will fail. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to