[jira] [Comment Edited] (SPARK-14409) Investigate adding a RankingEvaluator to ML

Nick Pentreath (JIRA) Mon, 06 Mar 2017 01:08:23 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896933#comment-15896933
 ]


Nick Pentreath edited comment on SPARK-14409 at 3/6/17 9:06 AM:
----------------------------------------------------------------

I've thought about this a lot over the past few days, and I think the approach 
should be in line with that suggested by [~roberto.mirizzi] & [~danilo.ascione].

*Goal*

Provide a DataFrame-based ranking evaluator that is general enough to handle 
common scenarios such as recommendations (ALS), search ranking, ad click 
prediction using ranking metrics (e.g. recent Kaggle competitions for 
illustration: [Outbrain Ad Clicks using 
MAP|https://www.kaggle.com/c/outbrain-click-prediction#evaluation], [Expedia 
Hotel Search Ranking using 
NDCG|https://www.kaggle.com/c/expedia-personalized-sort#evaluation]).

*RankingEvaluator input format*

{{evaluate}} would take a {{DataFrame}} with columns:

* {{queryCol}} - the column containing "query id" (e.g. "query" for cases such 
as search ranking; "user" for recommendations; "impression" for ad click 
prediction/ranking, etc);
* {{documentCol}} - the column containing "document id" (e.g. "document" in 
search, "item" in recommendation, "ad" in ad ranking, etc);
* {{labelCol}} (or maybe {{relevanceCol}} to be more precise) - the column 
containing the true relevance score for a query-document pair (e.g. in 
recommendations this would be the "rating"). This column will only be used for 
filtering out "irrelevant" documents from the ground-truth set (see Param 
{{goodThreshold}} mentioned 
[above|https://issues.apache.org/jira/browse/SPARK-14409?focusedCommentId=15826901&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15826901)]);
* {{predictionCol}} - the column containing the predicted relevance score for a 
query-document pair. The predicted ids will be ordered by this column for 
computing ranking metrics (for which order matters in predictions but generally 
not for ground-truth which is treated as a set).

The reasoning is that this format is flexible & generic enough to encompass the 
diverse use cases mentioned above.

Here is an illustrative example from recommendations as a special case:

{code}
+------+-------+------+----------+
|userId|movieId|rating|prediction|
+------+-------+------+----------+
|   230|    318|   5.0| 4.2403245|
|   230|   3424|   4.0|      null|
|   230|  81191|  null|  4.317455|
+------+-------+------+----------+
{code}

You will notice that {{rating}} and {{prediction}} columns can be {{null}}. 
This is by design. There are three cases shown above:

# 1st row indicates a query-document (user-item) pair that occurs in *both* the 
ground-truth set and the top-k predictions;
# 2nd row indicates a user-item pair that occurs in the ground-truth set, but 
*not* in the top-k predictions;
# 3rd row indicates a user-item pair that *does not* occur in the ground-truth 
set, but *does* occur in the top-k predictions;

*Note* that while technically the input allows both these columns to be 
{{null}} in practice that won't occur since a query-document pair must occur in 
at least one of the ground-truth set or predictions. If it does occur for some 
reason it can be ignored.

*Evaluator approach*

The evaluator will perform a window function over {{queryCol}} and order by 
{{predictionCol}} within each query. Then, {{collect_list}} can be used to 
arrive at the following intermediate format:

{code}
+------+--------------------+--------------------+
|userId|         true_labels|    predicted_labels|
+------+--------------------+--------------------+
|   230|[318, 3424, 7139,...|[81191, 93040, 31...|
+------+--------------------+--------------------+
{code}

*Relationship to RankingMetrics*

Technically the intermediate format above is the same format as used for 
{{RankingMetrics}}, and perhaps we could simple wrap the {{mllib}} version. 
*Note* however that the {{mllib}} class is parameterized by the type of 
"document": {code}RankingMetrics[T]{code}

I believe for the generic case we must support both {{NumericType}} and 
{{StringType}} for id columns (rather than restricting to {{Int}} as in Danilo 
& Roberto versions above). So either:
# the evaluator must be similarly parameterized; or
# we will need to re-write the ranking metrics computations as UDFs as follows: 
{code} udf { (predicted: Seq[Any], actual: Seq[Any]) => ... {code} 

I strongly prefer option #2 as it is more flexible and in keeping with the 
DataFrame style of Spark ML components (as a side note, this will give us a 
chance to review the implementations & naming of metrics, since there are some 
issues with a few of the metrics).


That is my proposal (sorry Yong, this is quite different now from the work 
you've done in your PR). If Yong or Danilo has time to update his PR in this 
direction, let me know.

cc [~josephkb] FYI

Thanks!


was (Author: mlnick):
I've thought about this a lot over the past few days, and I think the approach 
should be in line with that suggested by [~roberto.mirizzi] & [~danilo.ascione].

*Goal*

Provide a DataFrame-based ranking evaluator that is general enough to handle 
common scenarios such as recommendations (ALS), search ranking, ad click 
prediction using ranking metrics (e.g. recent Kaggle competitions for 
illustration: [Outbrain Ad Clicks using 
MAP|https://www.kaggle.com/c/outbrain-click-prediction#evaluation], [Expedia 
Hotel Search Ranking using 
NDCG|https://www.kaggle.com/c/expedia-personalized-sort#evaluation]).

*RankingEvaluator input format*

{{evaluate}} would take a {{DataFrame}} with columns:

* {{queryCol}} - the column containing "query id" (e.g. "query" for cases such 
as search ranking; "user" for recommendations; "impression" for ad click 
prediction/ranking, etc);
* {{documentCol}} - the column containing "document id" (e.g. "document" in 
search, "item" in recommendation, "ad" in ad ranking, etc);
* {{labelCol}} (or maybe {{relevanceCol}} to be more precise) - the column 
containing the true relevance score for a query-document pair (e.g. in 
recommendations this would be the "rating"). This column will only be used for 
filtering out "irrelevant" documents from the ground-truth set (see Param 
{{goodThreshold}} mentioned 
[above|https://issues.apache.org/jira/browse/SPARK-14409?focusedCommentId=15826901&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15826901)]);
* {{predictionCol}} - the column containing the predicted relevance score for a 
query-document pair. The predicted ids will be ordered by this column for 
computing ranking metrics (for which order matters in predictions but generally 
not for ground-truth which is treated as a set).

The reasoning is that this format is flexible & generic enough to encompass the 
diverse use cases mentioned above.

Here is an illustrative example from recommendations as a special case:

{code}
+------+-------+------+----------+
|userId|movieId|rating|prediction|
+------+-------+------+----------+
|   230|    318|   5.0| 4.2403245|
|   230|   3424|   4.0|      null|
|   230|  81191|  null|  4.317455|
+------+-------+------+----------+
{code}

You will notice that {{rating}} and {{prediction}} columns can be {{null}}. 
This is by design. There are three cases shown above:

# 1st row indicates a query-document (user-item) pair that occurs in *both* the 
ground-truth set and the top-k predictions;
# 2nd row indicates a user-item pair that occurs in the ground-truth set, but 
*not* in the top-k predictions;
# 3rd row indicates a user-item pair that *does not* occur in the ground-truth 
set, but *does* occur in the top-k predictions;

*Note* that while technically the input allows both these columns to be 
{{null}} in practice that won't occur since a query-document pair must occur in 
at least one of the ground-truth set or predictions. If it does occur for some 
reason it can be ignored.

*Evaluator approach*

The evaluator will perform a window function over {{queryCol}} and order by 
{{predictionCol}} within each query. Then, {{collect_list}} can be used to 
arrive at the following intermediate format:

{code}
+------+--------------------+--------------------+
|userId|         true_labels|    predicted_labels|
+------+--------------------+--------------------+
|   230|[318, 3424, 7139,...|[81191, 93040, 31...|
+------+--------------------+--------------------+
{code}

*Relationship to RankingMetrics*

Technically the intermediate format above is the same format as used for 
{{RankingMetrics}}, and perhaps we could simple wrap the {{mllib}} version. 
*Note* however that the {{mllib}} class is parameterized by the type of 
"document": {code}RankingMetrics[T]{code}

I believe for the generic case we must support both {{NumericType}} and 
{{StringType}} for id columns (rather than restricting to {{Int}} as in Danilo 
& Roberto versions above). So either:
# the evaluator must be similarly parameterized; or
# we will need to re-write the ranking metrics computations as UDFs as follows: 
{code} udf { (predicted: Seq[Any], actual: Seq[Any]) => ... {code} 

I strongly prefer option #2 as it is more flexible and in keeping with the 
DataFrame style of Spark ML components (as a side note, this will give us a 
chance to review the implementations & naming of metrics, since there are some 
issues with a few of the metrics).


That is my proposal (sorry Yong, this is quite different now from the work 
you've done in your PR). If Yong or Danilo has time to update his PR in this 
direction, let me know.

Thanks!

> Investigate adding a RankingEvaluator to ML
> -------------------------------------------
>
>                 Key: SPARK-14409
>                 URL: https://issues.apache.org/jira/browse/SPARK-14409
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Nick Pentreath
>            Priority: Minor
>
> {{mllib.evaluation}} contains a {{RankingMetrics}} class, while there is no 
> {{RankingEvaluator}} in {{ml.evaluation}}. Such an evaluator can be useful 
> for recommendation evaluation (and can be useful in other settings 
> potentially).
> Should be thought about in conjunction with adding the "recommendAll" methods 
> in SPARK-13857, so that top-k ranking metrics can be used in cross-validators.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-14409) Investigate adding a RankingEvaluator to ML

Reply via email to