Hello All, in the list of JIRAs i didn't find anything related to fetchFailedException.
as mentioned above "When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why. We have a scenario when spark job complains FetchFailedException as one of the data nodes got rebooted in the middle of job running . Now due to this we have few duplicate data and few missing data . Why is spark not handling this scenario correctly ? kind of we shouldn't miss any data and we shouldn't create duplicate data . " We have to rerun the job again to fix this data quality issue . Please let me know why this case is not handled properly by Spark ? On Thu, Feb 29, 2024 at 9:50 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > Please use the url as thr full string including '()' part. > > Or you can seach directly at ASF Jira with 'Spark' project and three > labels, 'Correctness', 'correctness' and 'data-loss'. > > Dongjoon > > On Thu, Feb 29, 2024 at 11:54 Prem Sahoo <prem.re...@gmail.com> wrote: > >> Hello Dongjoon, >> Thanks for emailing me. >> Could you please share a list of fixes as the link provided by you is >> not working. >> >> On Thu, Feb 29, 2024 at 11:27 AM Dongjoon Hyun <dongj...@apache.org> >> wrote: >> >>> Hi, >>> >>> If you are observing correctness issues, you may hit some old (and >>> fixed) correctness issues. >>> >>> For example, from Apache Spark 3.2.1 to 3.2.4, we fixed 31 correctness >>> issues. >>> >>> >>> https://issues.apache.org/jira/issues/?filter=12345390&jql=project%20%3D%20SPARK%20AND%20fixVersion%20in%20(3.2.1%2C%203.2.2%2C%203.2.3%2C%203.2.4)%20AND%20labels%20in%20(Correctness%2C%20correctness%2C%20data-loss) >>> >>> There are more fixes in 3.3 and 3.4 and 3.5, too. >>> >>> Please use the latest version, Apache Spark 3.5.1, because Apache Spark >>> 3.2 and 3.3 are in the End-Of-Support status of the community. >>> >>> It would be help if you can report any correctness issues with Apache >>> Spark 3.5.1. >>> >>> Thanks, >>> Dongjoon. >>> >>> On 2024/02/29 15:04:41 Prem Sahoo wrote: >>> > When Spark job shows FetchFailedException it creates few duplicate >>> data and >>> > we see few data also missing , please explain why. We have scenario >>> when >>> > spark job complains FetchFailedException as one of the data node got >>> > rebooted middle of job running . >>> > >>> > Now due to this we have few duplicate data and few missing data . Why >>> spark >>> > is not handling this scenario correctly ? kind of we shouldn't miss any >>> > data and we shouldn't create duplicate data . >>> > >>> > >>> > >>> > I am using spark3.2.0 version. >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>>