Anna created SOLR-16596:
---------------------------
Summary: LTR MultipleAdditiveTreeModel do not support missing
features' value
Key: SOLR-16596
URL: https://issues.apache.org/jira/browse/SOLR-16596
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Anna
The current MultipleAdditiveTree model doesn't support missing features' values.
When a feature value is not passed, the model directly translates it to zero.
Other LTR model libraries, like xgboost, are able to differentiate missing
values from other values and also from zero values. They learn how to treat
missing values at training time and add an additional "missing" branch to the
tree with the direction learned to be the best in that situation.
It would be nice to integrate this feature also in Solr MultipleAdditiveTree
models. An additional "missing" parameter should be added to the
RegressionTreeNode. This will determine the direction to take in case the
feature value is missing.
This integration will allow us to differentiate between zero and missing
features.
For example, if the feature is "hotel_avg_review" (with a ranking between zero
and five stars), we would like to behave differently if the hotel has no
reviews (we do not know if it is good) or if it has a review of zero stars (the
hotel is bad).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]