Xiangrui Meng created SPARK-16149:
-------------------------------------

             Summary: API consistency discussion: CountVectorizer.{minDF -> 
minDocFreq, minTF -> minTermFreq}
                 Key: SPARK-16149
                 URL: https://issues.apache.org/jira/browse/SPARK-16149
             Project: Spark
          Issue Type: Brainstorming
          Components: MLlib
    Affects Versions: 2.0.0
            Reporter: Xiangrui Meng


We used `minDF` and `minTF` in CountVectorizer and `minDocFreq` in IDF. It 
would be nice to keep the naming consistent. This was discussed in 
https://github.com/apache/spark/pull/7388 and the decision was made based on 
sklearn compatibility. However, we didn't look broadly across MLlib APIs. Maybe 
we can live with this small inconsistency but it would be nice to discuss the 
guideline (consistent with other libraries or existing ones in MLlib).

cc: [~josephkb] [~yuhaoyan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to