[
https://issues.apache.org/jira/browse/SPARK-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864872#comment-15864872
]
Nick Pentreath commented on SPARK-14503:
----------------------------------------
Seems {{PrefixSpan}} even takes different input: {{Array[Array[T]]}} vs
FPGrowth: {{Array[T]}}. So it may be tricky to unify.
However we do have the case where e.g. {{QuantileDiscretizer}} returns a
{{Bucketizer}} as {{Model}} from {{fit}}. In that case {{Bucketizer}} can be
instantiated directly and independently, but it could in theory be the case
that some other estimator returns a {{Bucketizer}} as its model.
So we could perhaps think about both {{FPGrowth}} and {{PrefixSpan}} returning
an {{AssociationRuleModel}} from {{fit}}. It could work if the input can be
generalized to {{Seq[T]}} where for {{FPGrowth}} it would be {{Seq[Item]}} and
for {{PrefixSpan}} it would be {{Seq[Seq[Item]]}}. The output of {{transform}}
for the model would be the predicted items as above. It would expose
{{getFreqItems}} and {{getAssociationRules}} both returning a {{DataFrame}}.
Is there something in the nature of {{PrefixSpan}} vs {{FPGrowth}} that makes
this too difficult? (I'll have to go read the papers when I get some time!)
But having said that it could be pretty complex to try to support this. If so,
unless there's a compelling argument I'd go for [~josephkb]'s suggestion above,
and hide the association rule class for now (can expose later as needed). Then
{{PrefixSpan}} will be totally independent and return its own
{{PrefixSpanModel}} (that may also expose a {{transform}} method that has
similar semantics but different internals).
> spark.ml Scala API for FPGrowth
> -------------------------------
>
> Key: SPARK-14503
> URL: https://issues.apache.org/jira/browse/SPARK-14503
> Project: Spark
> Issue Type: Sub-task
> Components: ML
> Reporter: Joseph K. Bradley
>
> This task is the first port of spark.mllib.fpm functionality to spark.ml
> (Scala).
> This will require a brief design doc to confirm a reasonable DataFrame-based
> API, with details for this class. The doc could also look ahead to the other
> fpm classes, especially if their API decisions will affect FPGrowth.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]