The join condition with && is throwing an exception: val df = baseDF.join(mccDF, mccDF("medical_claim_id") <=> baseDF("medical_claim_id") && mccDF("medical_claim_detail_id") <=> baseDF("medical_claim_detail_id"), "left") .join(revCdDF, revCdDF("revenue_code_padded_str") <=> mccDF("mcc_code"), "left") .select(baseDF("medical_claim_id"), baseDF("medical_claim_detail_id"), baseDF("revenue_code"), baseDF("rev_code_distinct_count"), baseDF("rtos_1_1_count"), baseDF("rtos_1_0_count"), baseDF("er_visit_flag"), baseDF("observation_stay_flag"), revCdDF("rtos_2_code"), revCdDF("rtos_2_hierarchy")) .where(revCdDF("rtos_2_code").between(8, 27).isNotNull) .groupBy( baseDF("medical_claim_id"), baseDF("medical_claim_detail_id") ) .agg(min(revCdDF("rtos_2_code").alias("min_rtos_2_8_thru_27")), min(revCdDF("rtos_2_hierarchy").alias("min_rtos_2_8_thru_27_hier")))
This query runs fine: val df = baseDF.join(mccDF, mccDF("medical_claim_id") <=> baseDF("medical_claim_id"), "left") .join(mccDF, mccDF("medical_claim_detail_id") <=> baseDF("medical_claim_detail_id"), "left") .join(revCdDF, revCdDF("revenue_code_padded_str") <=> mccDF("mcc_code"), "left") .select(baseDF("medical_claim_id"), baseDF("medical_claim_detail_id"), baseDF("revenue_code"), baseDF("rev_code_distinct_count"), baseDF("rtos_1_1_count"), baseDF("rtos_1_0_count"), baseDF("er_visit_flag"), baseDF("observation_stay_flag"), revCdDF("rtos_2_code"), revCdDF("rtos_2_hierarchy")) .where(revCdDF("rtos_2_code").between(8, 27).isNotNull) .groupBy( baseDF("medical_claim_id"), baseDF("medical_claim_detail_id") ) .agg(min(revCdDF("rtos_2_code").alias("min_rtos_2_8_thru_27")), min(revCdDF("rtos_2_hierarchy").alias("min_rtos_2_8_thru_27_hier"))) If I remove the multiple Columns in the join and create a join statement for each one then the exception goes away. Is there a better way to join multiple columns? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-left-join-with-multiple-columns-Join-condition-is-missing-or-trivial-Use-the-CROSS-JOIN-syntax-tp21297.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org