Nikolas Vanderhoof created SPARK-27359:
------------------------------------------
Summary: Joins on some array functions can be optimized
Key: SPARK-27359
URL: https://issues.apache.org/jira/browse/SPARK-27359
Project: Spark
Issue Type: Improvement
Components: Optimizer, SQL
Affects Versions: 3.0.0
Reporter: Nikolas Vanderhoof
I encounter these cases frequently, and implemented the optimization manually
(as shown here). If others experience this as well, perhaps it would be good to
add appropriate tree transformations into catalyst. I can create some rough
draft implementations but expect I will need assistance when it comes to
resolving the generating expressions in the logical plan.
h1. Case 1
A join like this:
{code:scala}
left.join(
right,
arrays_overlap(left("a"), right("b")) // Creates a cartesian product in
the logical plan
)
{code}
will produce the same results as:
{code:scala}
{
val leftPrime = left.withColumn("exploded_a", explode(col("a")))
val rightPrime = right.withColumn("exploded_b", explode(col("b")))
leftPrime.join(
rightPrime,
leftPrime("exploded_a") === rightPrime("exploded_b")
// Equijoin doesn't produce cartesian product
).drop("exploded_a", "exploded_b").distinct
}
{code}
h1. Case 2
A join like this:
{code:scala}
left.join(
right,
array_contains(left("arr"), right("value")) // Cartesian product in logical
plan
)
{code}
will produce the same results as:
{code:scala}
{
val leftPrime = left.withColumn("exploded_arr", explode(col("arr")))
leftPrime.join(
right,
leftPrime("exploded_arr") === right("value") // Fast equijoin
).drop("exploded_arr").distinct
}
{code}
h1. Case 3
A join like this:
{code:scala}
left.join(
right,
array_contains(right("arr"), left("value")) // Cartesian product in logical
plan
)
{code}
will produce the same results as:
{code:scala}
{
val rightPrime = right.withColumn("exploded_arr", explode(col("arr")))
left.join(
rightPrime,
left("value") === rightPrime("exploded_arr") // Fast equijoin
).drop("exploded_arr").distinct
}
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]