zhangsongcheng created SPARK-24078: -------------------------------------- Summary: reduce with unionAll takes a long time Key: SPARK-24078 URL: https://issues.apache.org/jira/browse/SPARK-24078 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.6.3 Reporter: zhangsongcheng
I try to sample the traning sets with each category,and then uion all samples together.This is my code: {{ def balanceCategory(dataSet: DataFrame): DataFrame = {}} {{ val samples = LabelConf.categorys.map { }}{{category => }} {{ val tmpDataSet = dataSet.filter(col("category_id") === category)}} {{ val sample = underSample(tmpDataSet, category) sample }} {{ } }} {{ samples.reduce((x, y) => x.unionAll(y))}} {{ } }} {{ def underSample(dataSet: DataFrame, cardID: String): DataFrame = { val positiveSample = dataSet.filter(col("label") > 0.5).sample(false, 0.1)}} {{ val negativeSample = dataSet.filter(col("label") < 0.5).sample(false, 0.1)}} {{ val positiveSample.unionAll(negativeSample)}} } But the code blocked in `{{samples.reduce((x, y) => x.unionAll(y))`}}, and it runs slowly and slowly, and even cannot run any more.It confused me a long time.Who can help me? Than you! -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org