Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

Dongjoon Hyun Thu, 29 Feb 2024 08:27:23 -0800

Hi,

If you are observing correctness issues, you may hit some old (and fixed) 
correctness issues.


For example, from Apache Spark 3.2.1 to 3.2.4, we fixed 31 correctness issues.

https://issues.apache.org/jira/issues/?filter=12345390&jql=project%20%3D%20SPARK%20AND%20fixVersion%20in%20(3.2.1%2C%203.2.2%2C%203.2.3%2C%203.2.4)%20AND%20labels%20in%20(Correctness%2C%20correctness%2C%20data-loss)

There are more fixes in 3.3 and 3.4 and 3.5, too.

Please use the latest version, Apache Spark 3.5.1, because Apache Spark 3.2 and 
3.3 are in the End-Of-Support status of the community.

It would be help if you can report any correctness issues with Apache Spark 
3.5.1.

Thanks,
Dongjoon.

On 2024/02/29 15:04:41 Prem Sahoo wrote:
> When Spark job shows FetchFailedException it creates few duplicate data and
> we see few data also missing , please explain why. We have scenario when
> spark job complains FetchFailedException as one of the data node got
> rebooted middle of job running .
> 
> Now due to this we have few duplicate data and few missing data . Why spark
> is not handling this scenario correctly ? kind of we shouldn't miss any
> data and we shouldn't create duplicate data .
> 
> 
> 
> I am using spark3.2.0 version.
> 

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

Reply via email to