[ https://issues.apache.org/jira/browse/SPARK-29272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-29272. ---------------------------------- Resolution: Invalid > dataframe.write.format("libsvm").save() take too much time > ---------------------------------------------------------- > > Key: SPARK-29272 > URL: https://issues.apache.org/jira/browse/SPARK-29272 > Project: Spark > Issue Type: Question > Components: ML > Affects Versions: 2.2.0 > Reporter: 张焕明 > Priority: Major > > I have a pyspark dataframe with about 10 thousand records,while using pyspark > api to dump the whole dataset. It take 10 seconds. While I use filter api to > select 10 records and dump the temp_df again. It take 8 seconds.why will it > take so much time? How can I improve it? Thank you! > MLUtils.convertVectorColumnsToML(dataframe).write.format("libsvm").save('path'), > mode='overwrite'), > temp_df = dataframe.filter(train_df['__index'].between(int(0,10)) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org