zhangsongcheng created SPARK-24078:
--------------------------------------

             Summary: reduce with unionAll takes a long time
                 Key: SPARK-24078
                 URL: https://issues.apache.org/jira/browse/SPARK-24078
             Project: Spark
          Issue Type: Bug
          Components: Build
    Affects Versions: 1.6.3
            Reporter: zhangsongcheng


I try to sample the traning sets with each category,and then uion all samples 
together.This is my code:
{{  def balanceCategory(dataSet: DataFrame): DataFrame = {}}
{{    val samples = LabelConf.categorys.map { }}{{category => }}
{{      val tmpDataSet = dataSet.filter(col("category_id") === category)}}
{{      val sample = underSample(tmpDataSet, category) sample }}
{{    } }}
{{    samples.reduce((x, y) => x.unionAll(y))}}
{{  } }}
 
{{  def underSample(dataSet: DataFrame, cardID: String): DataFrame = {      val 
positiveSample = dataSet.filter(col("label") > 0.5).sample(false, 0.1)}}
{{    val negativeSample = dataSet.filter(col("label") < 0.5).sample(false, 
0.1)}}
{{    val positiveSample.unionAll(negativeSample)}}
    }
 
But the code blocked in `{{samples.reduce((x, y) => x.unionAll(y))`}}, and it 
runs slowly and slowly, and even cannot run any more.It confused me a long 
time.Who can help me? Than you!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to