Asher Krim created SPARK-17629:

             Summary: Should ml Word2Vec findSynonyms match the mllib 
                 Key: SPARK-17629
             Project: Spark
          Issue Type: Question
            Reporter: Asher Krim
            Priority: Minor

ml Word2Vec's findSynonyms methods depart from mllib in that they return 
distributed results, rather than the results directly:

  def findSynonyms(word: String, num: Int): DataFrame = {
    val spark = SparkSession.builder().getOrCreate()
    spark.createDataFrame(wordVectors.findSynonyms(word, num)).toDF("word", 

What was the reason for this decision? I would think that most users would 
request a reasonably small number of results back, and want to use them 
directly on the driver, similar to the _take_ method on dataframes. Returning 
parallelized results creates a costly round trip for the data that doesn't seem 

The original PR:
[~MechCoder] - do you perhaps recall the reason?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to