[
https://issues.apache.org/jira/browse/SPARK-19899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905367#comment-15905367
]
yuhao yang commented on SPARK-19899:
------------------------------------
Thanks for the suggestion. I'm neutral on this. Not sure if we need a new trait
for each data type. And how do you plan to work with array<arrray<T>> for
PrefixSpan ? But this reminds me we can support vector for FPGrowth, and also
maybe in the document we can add "the order of items in each record will not
affect the training process and model."
> FPGrowth input column naming
> ----------------------------
>
> Key: SPARK-19899
> URL: https://issues.apache.org/jira/browse/SPARK-19899
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 2.2.0
> Reporter: Maciej Szymkiewicz
>
> Current implementation extends {{HasFeaturesCol}}. Personally I find it
> rather unfortunate. Up to this moment we used consistent conventions - if we
> mix-in {{HasFeaturesCol}} the {{featuresCol}} should be {{VectorUDT}}.
> Using the same {{Param}} for an {{array<T>}} (and possibly for
> {{array<arrray<T>>}} once {{PrefixSpan}} is ported to {{ml}}) will be
> confusing for the users.
> I would like to suggest adding new {{trait}} (let's say
> {{HasTransactionsCol}}) to clearly indicate that the input type differs for
> the other {{Estiamtors}}.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]