I am using below test.  as a unit test though , it will pass, as  to
simulate executor lost in a single vm is difficult, but there is definitely
a bug.
Using debugger if you check, the ShuffleStage.isDeterminate is turning out
to be true, though it clearly should not be.
As result if you look at DagScheduler, TaskSchedulerImpl and TaskSet code,
it will not retry the ShuffleStage fully, instead would only retry missing
task.
Which means on retry if the join used a random value which puts it in
already completed shuffle task, the results will be missed.

> package org.apache.spark.sql.vectorized
>
> import org.apache.spark.rdd.ZippedPartitionsRDD2
> import org.apache.spark.sql.{DataFrame, Encoders, QueryTest}
> import org.apache.spark.sql.catalyst.expressions.Literal
> import org.apache.spark.sql.execution.datasources.FileScanRDD
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.internal.SQLConf
> import org.apache.spark.sql.test.SharedSparkSession
> import org.apache.spark.sql.types.LongType
>
> class BugTest extends QueryTest with SharedSparkSession {
>   import testImplicits._
>  /* test("no retries") {
>     withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "false") {
>       val baseDf = spark.createDataset(
>         Seq((1L, "a"), (2L, "b"), (3L, "c"), (null, "w"), (null, "x"), (null, 
> "y"), (null, "z")))(
>         Encoders.tupleEncoder(Encoders.LONG, Encoders.STRING)).toDF("pkLeft", 
> "strleft")
>
>       val leftOuter = baseDf.select(
>         $"strleft", when(isnull($"pkLeft"), monotonically_increasing_id() + 
> Literal(100)).
>           otherwise($"pkLeft").as("pkLeft"))
>       leftOuter.show(10000)
>
>       val innerRight = spark.createDataset(
>         Seq((1L, "11"), (2L, "22"), (3L, "33")))(
>         Encoders.tupleEncoder(Encoders.LONG, 
> Encoders.STRING)).toDF("pkRight", "strright")
>
>       val innerjoin = leftOuter.join(innerRight, $"pkLeft" === $"pkRight", 
> "inner")
>
>       innerjoin.show(1000)
>
>       val outerjoin = leftOuter.join(innerRight, $"pkLeft" === $"pkRight", 
> "left_outer")
>
>       outerjoin.show(1000)
>     }
>   } */
>
>   test("with retries") {
>     withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "false") {
>       withTable("outer", "inner") {
>         createBaseTables()
>         val outerjoin: DataFrame = getOuterJoinDF
>
>         println("Initial data")
>         outerjoin.show(1000)
>         val correctRows = outerjoin.collect()
>         for( i <- 0 until 100) {
>          FileScanRDD.throwException = false
>           ZippedPartitionsRDD2.throwException = true
>           val rowsAfterRetry = getOuterJoinDF.collect()
>           import scala.jdk.CollectionConverters._
>           val temp = spark.createDataFrame(rowsAfterRetry.toSeq.asJava, 
> outerjoin.schema)
>           println("after retry data")
>           temp.show(1000)
>           assert(correctRows.length == rowsAfterRetry.length)
>           val retriedResults = rowsAfterRetry.toBuffer
>           correctRows.foreach(r => {
>             val index = retriedResults.indexWhere(x =>
>               r.getString(0) == x.getString(0) &&
>                 (r.getLong(1) == x.getLong(1) || (x.isNullAt(2) && 
> r.isNullAt(2) &&
>                   x.isNullAt(3) && r.isNullAt(3))) &&
>                 ((r.isNullAt(2) && x.isNullAt(2)) || r.getLong(2) == 
> x.getLong(2)) &&
>                 ((r.isNullAt(3) && x.isNullAt(3)) || r.getString(3) == 
> x.getString(3)))
>             assert(index >= 0)
>             retriedResults.remove(index)
>           }
>           )
>           assert(retriedResults.isEmpty)
>         }
>
>      //   Thread.sleep(10000000)
>       }
>     }
>   }
>
>   private def createBaseTables(): Unit = {
>     /*val outerDf = spark.createDataset(
>       Seq((1L, "aa"), (null, "aa"), (2L, "bb"), (null, "bb"), (3L, "cc"), 
> (null, "cc")))(
>       Encoders.tupleEncoder(Encoders.LONG, Encoders.STRING)).toDF("pkLeft", 
> "strleft")
>     outerDf.write.format("parquet").saveAsTable("outer")*/
>
>     val outerDf = spark.createDataset(
>       Seq((1L, "aa"), (null, "aa"), (2L, "aa"), (null, "bb"), (3L, "bb"), 
> (null, "bb")))(
>       Encoders.tupleEncoder(Encoders.LONG, Encoders.STRING)).toDF("pkLeftt", 
> "strleft")
>     
> outerDf.write.format("parquet").partitionBy("strleft").saveAsTable("outer")
>
>     /*val innerDf = spark.createDataset(
>       Seq((1L, "11"), (2L, "22"), (3L, "33")))(
>       Encoders.tupleEncoder(Encoders.LONG, Encoders.STRING)).toDF("pkRight", 
> "strright")*/
>
>     val innerDf = spark.createDataset(
>       Seq((1L, "11"), (2L, "11"), (3L, "33")))(
>       Encoders.tupleEncoder(Encoders.LONG, Encoders.STRING)).toDF("pkRight", 
> "strright")
>
>     
> innerDf.write.format("parquet").partitionBy("strright").saveAsTable("inner")
>
>     val innerInnerDf = spark.createDataset(
>       Seq((1L, "111"), (2L, "222"), (3L, "333")))(
>       Encoders.tupleEncoder(Encoders.LONG, 
> Encoders.STRING)).toDF("pkpkRight", "strstrright")
>
>   //  innerInnerDf.write.format("parquet").saveAsTable("innerinner")
>   }
>
>   private def getOuterJoinDF = {
>     val leftOuter = spark.table("outer").select(
>       $"strleft", when(isnull($"pkLeftt"), floor(rand() * Literal(10000000L)).
>         cast(LongType)).
>         otherwise($"pkLeftt").as("pkLeft"))
>
>    val innerRight = spark.table("inner")
>  //   val innerinnerRight = spark.table("innerinner")
>
>     val outerjoin = leftOuter.hint("shuffle_hash").
>       join(innerRight, $"pkLeft" === $"pkRight", "left_outer")
>     outerjoin
>     /*
>     val outerOuterJoin = outerjoin.hint("shuffle_hash").
>       join(innerinnerRight, $"pkLeft" === $"pkpkRight", "left_outer")
>     outerOuterJoin
>
>      */
>   }
> }
>
>
>
>
On Sun, Jan 26, 2025 at 10:05 PM Asif Shahid <asif.sha...@gmail.com> wrote:

> Sure. I will send prototypical query tomorrow. Though its difficult to
> simulate issue using unit test , but I think the issue is
> Rdd.isIndeterminate is not returning true for the query. As a result, on
> retry, the shuffle stage is not reattempted fully.
> And rdd is not returning inDeterminate as true , is due to ShuffleRdd not
> taking into account , as the ShuffleDependency is not taking into account
> inDeterminate nature of HashPartitioner.
> Moreover the attribute ref provided to HashPartitioner though is pointing
> to an inDeterminate alias, is having deterministic flag as true ( which in
> itself is logically correct).
> So I feel apart from ShuffleDependency code change, all expressions should
> have a lazy Boolean say containsIndeterministic  component. Which will be
> true if the expression is indeterministic or contains any attribute ref
> which has is indeterministicComponent true.
>
> And on personal note.. thanks for your interest..  this is very rare
> attitude.
> Regards
> Asif
>
> On Sun, Jan 26, 2025, 9:45 PM Ángel <angel.alvarez.pas...@gmail.com>
> wrote:
>
>> Hi Asif,
>>
>> Could you provide an example (code+dataset) to analize this? Looks
>> interesting ...
>>
>>
>> Regards,
>> Ángel
>>
>> El dom, 26 ene 2025 a las 20:58, Asif Shahid (<asif.sha...@gmail.com>)
>> escribió:
>>
>>> Hi,
>>> On further thoughts, I concur that leaf expressions like AttributeRefs
>>> can always be considered to be  deterministic, as , as a java variable the
>>> value contained in it per iteration is invariant ( except when changed by
>>> some deterministic logic). So in that sense what I said in the above mail
>>> as that an issue is incorrect.
>>> But I think that AttributeRef should have a boolean method which tells,
>>> whether the value it represents is from an indeterminate source or not.
>>> Regards
>>> Asif
>>>
>>>
>>>
>>> On Fri, Jan 24, 2025 at 5:18 PM Asif Shahid <asif.sha...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>> While testing a use case where the query had an outer join such that
>>>> joining key of left outer table either had a valid value or a random value(
>>>> salting to avoid skew).
>>>> The case was reported to have incorrect results in case of node
>>>> failure, with retry.
>>>> On debugging the code, have found following, which has left me confused
>>>> as to what is spark's strategy for indeterministic fields.
>>>> Some serious issues are :
>>>> 1) All the leaf expressions, like AttributeReference  are always
>>>> considered deterministic. Which means if an attribute is pointing to an
>>>> Alias which itself is indeterministic,  the attribute will still be
>>>> considered deterministic
>>>> 2) In CheckAnalysis there is code which checks whether each Operator
>>>> either supports indeterministic value or not . Join is not included in the
>>>> list of supported, but it passes even if the joining key is pointing to an
>>>> indeterministic alias. ( When I tried fixing it, found a plethora of
>>>> operators failing Like DeserializedObject, LocalRelation etc which are not
>>>> supposed to contain indeterministic attributes ( because they are not in
>>>> the list of supporting operators).
>>>> 3) The ShuffleDependency does not check for indeterministic nature of
>>>> partitioner ( fixed it locally and then realized that there is the bug #1
>>>> which needs to be fixed too).
>>>>
>>>> The code in DagScheduler / TaskSet, TaskScheduler etc, seems to have
>>>> been written , keeping in mind the indeterministic nature of the previous
>>>> and current stages , so as to rexecute previous stages as a whole, instead
>>>> of just missing tasks, but the above  3 points, do not seem to support the
>>>> code of DagScheduler / TaskScheduler.
>>>>
>>>> Regards
>>>> Asif
>>>>
>>>>

Reply via email to