Vitalii Slobodianyk created SPARK-9236:
------------------------------------------
Summary: Left Outer Join with empty JavaPairRDD returns empty RDD
Key: SPARK-9236
URL: https://issues.apache.org/jira/browse/SPARK-9236
Project: Spark
Issue Type: Bug
Affects Versions: 1.4.1, 1.3.1
Reporter: Vitalii Slobodianyk
When the *left outer join* is performed on a non-empty {{JavaPairRDD}} with a
{{JavaPairRDD}} which was created with the {{emptyRDD()}} method the resulting
RDD is empty. In the following unit test the latest assert fails.
{code}
import static org.assertj.core.api.Assertions.assertThat;
import java.util.Collections;
import lombok.val;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.junit.Test;
import scala.Tuple2;
public class SparkTest {
@Test
public void joinEmptyRDDTest() {
val sparkConf = new SparkConf().setAppName("test").setMaster("local");
try (val sparkContext = new JavaSparkContext(sparkConf)) {
val oneRdd = sparkContext.parallelize(Collections.singletonList("one"));
val twoRdd = sparkContext.parallelize(Collections.singletonList("two"));
val threeRdd = sparkContext.emptyRDD();
val onePair = oneRdd.mapToPair(t -> new Tuple2<Integer, String>(1, t));
val twoPair = twoRdd.groupBy(t -> 1);
val threePair = threeRdd.groupBy(t -> 1);
assertThat(onePair.leftOuterJoin(twoPair).collect()).isNotEmpty();
assertThat(onePair.leftOuterJoin(threePair).collect()).isNotEmpty();
}
}
}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]