[
https://issues.apache.org/jira/browse/SPARK-19899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905405#comment-15905405
]
Maciej Szymkiewicz commented on SPARK-19899:
--------------------------------------------
In my opinion a trait for each input category ({{Vector}}, {{array<\_>}},
{{array<array<\_>>}}) is the way to go. Development overhead is low (these
things are small and easy to test), it is unlikely we'll need much more any
time soon, any this gives us some way to communicate expected input.
I am strongly against using {{Vector}} - it is counterintuitive, requires a
lot of additional effort and without any supported way of mapping from vector
to features (I don't count {{Column}} metadata) it will significantly degrade
user experience. Moreover it won't be useful for {{PrefixSpan}} at all. I
believe that we should acknowledge that pattern mining techniques are
significantly different from the common {{ml}} algorithms and don't hesitate to
reflect that in the API.
> FPGrowth input column naming
> ----------------------------
>
> Key: SPARK-19899
> URL: https://issues.apache.org/jira/browse/SPARK-19899
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 2.2.0
> Reporter: Maciej Szymkiewicz
>
> Current implementation extends {{HasFeaturesCol}}. Personally I find it
> rather unfortunate. Up to this moment we used consistent conventions - if we
> mix-in {{HasFeaturesCol}} the {{featuresCol}} should be {{VectorUDT}}.
> Using the same {{Param}} for an {{array<T>}} (and possibly for
> {{array<arrray<T>>}} once {{PrefixSpan}} is ported to {{ml}}) will be
> confusing for the users.
> I would like to suggest adding new {{trait}} (let's say
> {{HasTransactionsCol}}) to clearly indicate that the input type differs for
> the other {{Estiamtors}}.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]