William Benton created SPARK-17595: -------------------------------------- Summary: Inefficient selection in Word2VecModel.findSynonyms Key: SPARK-17595 URL: https://issues.apache.org/jira/browse/SPARK-17595 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 2.0.0 Reporter: William Benton Priority: Minor
The code in `Word2VecModel.findSynonyms` to choose the vocabulary elements with the highest similarity to the query vector currently sorts the similarities for every vocabulary element. This involves making multiple copies of the collection of similarities while doing a (relatively) expensive sort. It would be more efficient to find the best matches by maintaining a bounded priority queue and populating it with a single pass over the vocabulary. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org