[
https://issues.apache.org/jira/browse/SPARK-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15225254#comment-15225254
]
Joseph K. Bradley commented on SPARK-13629:
-------------------------------------------
I just realized that we should have added the binary toggle Param to
CountVectorizer (the Estimator) as well. (We need all Estimators to contain
the Model Params so that users can configure the whole Pipeline/Estimator
before running fit. I'll create a JIRA for that.) I'll create and link a JIRA
for this and HashingTF.
> Add binary toggle Param to CountVectorizer
> ------------------------------------------
>
> Key: SPARK-13629
> URL: https://issues.apache.org/jira/browse/SPARK-13629
> Project: Spark
> Issue Type: New Feature
> Components: ML
> Reporter: Joseph K. Bradley
> Assignee: yuhao yang
> Priority: Minor
> Fix For: 2.0.0
>
>
> It would be handy to add a binary toggle Param to CountVectorizer, as in the
> scikit-learn one:
> [http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html]
> If set, then all non-zero counts will be set to 1.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]