Vitalii Slobodianyk created SPARK-9236:
------------------------------------------

             Summary: Left Outer Join with empty JavaPairRDD returns empty RDD
                 Key: SPARK-9236
                 URL: https://issues.apache.org/jira/browse/SPARK-9236
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.4.1, 1.3.1
            Reporter: Vitalii Slobodianyk


When the *left outer join* is performed on a non-empty {{JavaPairRDD}} with a 
{{JavaPairRDD}} which was created with the {{emptyRDD()}} method the resulting 
RDD is empty. In the following unit test the latest assert fails.

{code}
import static org.assertj.core.api.Assertions.assertThat;

import java.util.Collections;

import lombok.val;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.junit.Test;

import scala.Tuple2;

public class SparkTest {

  @Test
  public void joinEmptyRDDTest() {
    val sparkConf = new SparkConf().setAppName("test").setMaster("local");

    try (val sparkContext = new JavaSparkContext(sparkConf)) {
      val oneRdd = sparkContext.parallelize(Collections.singletonList("one"));
      val twoRdd = sparkContext.parallelize(Collections.singletonList("two"));
      val threeRdd = sparkContext.emptyRDD();

      val onePair = oneRdd.mapToPair(t -> new Tuple2<Integer, String>(1, t));
      val twoPair = twoRdd.groupBy(t -> 1);
      val threePair = threeRdd.groupBy(t -> 1);

      assertThat(onePair.leftOuterJoin(twoPair).collect()).isNotEmpty();
      assertThat(onePair.leftOuterJoin(threePair).collect()).isNotEmpty();
    }
  }

}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to