[ https://issues.apache.org/jira/browse/SPARK-17629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-17629. ------------------------------- Resolution: Not A Problem This is probably best as a question on dev@, not a JIRA. You can easily collect the DataFrame locally. I suppose this gives you the option. > Should ml Word2Vec findSynonyms match the mllib implementation? > --------------------------------------------------------------- > > Key: SPARK-17629 > URL: https://issues.apache.org/jira/browse/SPARK-17629 > Project: Spark > Issue Type: Question > Reporter: Asher Krim > Priority: Minor > > ml Word2Vec's findSynonyms methods depart from mllib in that they return > distributed results, rather than the results directly: > {code} > def findSynonyms(word: String, num: Int): DataFrame = { > val spark = SparkSession.builder().getOrCreate() > spark.createDataFrame(wordVectors.findSynonyms(word, num)).toDF("word", > "similarity") > } > {code} > What was the reason for this decision? I would think that most users would > request a reasonably small number of results back, and want to use them > directly on the driver, similar to the _take_ method on dataframes. Returning > parallelized results creates a costly round trip for the data that doesn't > seem necessary. > The original PR: https://github.com/apache/spark/pull/7263 > [~MechCoder] - do you perhaps recall the reason? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org