[jira] [Commented] (SPARK-19899) FPGrowth input column naming

yuhao yang (JIRA) Fri, 10 Mar 2017 08:31:39 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-19899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905367#comment-15905367
 ]


yuhao yang commented on SPARK-19899:
------------------------------------

Thanks for the suggestion. I'm neutral on this. Not sure if we need a new trait 
for each data type. And how do you plan to work with array<arrray<T>> for 
PrefixSpan ? But this reminds me we can support vector for FPGrowth, and also 
maybe in the document we can add "the order of items in each record will not 
affect the training process and model."



> FPGrowth input column naming
> ----------------------------
>
>                 Key: SPARK-19899
>                 URL: https://issues.apache.org/jira/browse/SPARK-19899
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Maciej Szymkiewicz
>
> Current implementation extends {{HasFeaturesCol}}. Personally I find it 
> rather unfortunate. Up to this moment we used consistent conventions - if we 
> mix-in  {{HasFeaturesCol}} the {{featuresCol}} should be {{VectorUDT}}. 
> Using the same {{Param}} for an {{array<T>}} (and possibly for 
> {{array<arrray<T>>}} once {{PrefixSpan}} is ported to {{ml}}) will be 
> confusing for the users.
> I would like to suggest adding new {{trait}} (let's say 
> {{HasTransactionsCol}}) to clearly indicate that the input type differs for 
> the other {{Estiamtors}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-19899) FPGrowth input column naming

Reply via email to