Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

Prem Sahoo Thu, 29 Feb 2024 11:54:22 -0800

Hello Dongjoon,
Thanks for emailing me.
Could you please share a list of fixes  as the link provided by you is
not working.


On Thu, Feb 29, 2024 at 11:27 AM Dongjoon Hyun <[email protected]> wrote:

> Hi,
>
> If you are observing correctness issues, you may hit some old (and fixed)
> correctness issues.
>
> For example, from Apache Spark 3.2.1 to 3.2.4, we fixed 31 correctness
> issues.
>
>
> https://issues.apache.org/jira/issues/?filter=12345390&jql=project%20%3D%20SPARK%20AND%20fixVersion%20in%20(3.2.1%2C%203.2.2%2C%203.2.3%2C%203.2.4)%20AND%20labels%20in%20(Correctness%2C%20correctness%2C%20data-loss)
>
> There are more fixes in 3.3 and 3.4 and 3.5, too.
>
> Please use the latest version, Apache Spark 3.5.1, because Apache Spark
> 3.2 and 3.3 are in the End-Of-Support status of the community.
>
> It would be help if you can report any correctness issues with Apache
> Spark 3.5.1.
>
> Thanks,
> Dongjoon.
>
> On 2024/02/29 15:04:41 Prem Sahoo wrote:
> > When Spark job shows FetchFailedException it creates few duplicate data
> and
> > we see few data also missing , please explain why. We have scenario when
> > spark job complains FetchFailedException as one of the data node got
> > rebooted middle of job running .
> >
> > Now due to this we have few duplicate data and few missing data . Why
> spark
> > is not handling this scenario correctly ? kind of we shouldn't miss any
> > data and we shouldn't create duplicate data .
> >
> >
> >
> > I am using spark3.2.0 version.
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

Reply via email to