On Tue, Aug 7, 2018 at 8:39 AM, Wenchen Fan <[email protected]> wrote: > > SPARK-23243 <https://issues.apache.org/jira/browse/SPARK-23243>: > Shuffle+Repartition > on an RDD could lead to incorrect answers > It turns out to be a very complicated issue, there is no consensus about > what is the right fix yet. Likely to miss it in Spark 2.4 because it's a > long-standing issue, not a regression. >
This is a really serious data loss bug. Yes its very complex, but we absolutely have to fix this, I really think it should be in 2.4. Has worked on it stopped?
