Can you share some stats from Web UI just before the failure? Any earlier errors before FNFE?
Jacek On 4 Jul 2016 12:34 p.m., "kishore kumar" <akishore...@gmail.com> wrote: > @jacek: It is running on yarn-client mode, our code don't support running > in yarn-cluster mode and the job is running for around an hour and giving > the exception. > > @karhi: yarn application status is successful, resourcemanager logs did > not give any failure info except > 16/07/04 00:27:57 INFO executor.CoarseGrainedExecutorBackend: Driver > commanded a shutdown > 16/07/04 00:27:57 INFO storage.MemoryStore: MemoryStore cleared > 16/07/04 00:27:57 INFO storage.BlockManager: BlockManager stopped > 16/07/04 00:27:57 WARN executor.CoarseGrainedExecutorBackend: An unknown ( > slave1.domain.com:56055) driver disconnected. > 16/07/04 00:27:57 ERROR executor.CoarseGrainedExecutorBackend: Driver > 173.36.88.26:56055 disassociated! Shutting down. > 16/07/04 00:27:57 INFO util.ShutdownHookManager: Shutdown hook called > 16/07/04 00:27:57 INFO util.ShutdownHookManager: Deleting directory > /opt/mapr/tmp/hadoop-tmp/hadoop-mapr/nm-local-dir/usercache/user/appcache/application_1467474162580_29353/spark-9c0bfccc-74c3-4541-a2fd-19101e47b49a > End of LogType:stderr > > > On Mon, Jul 4, 2016 at 3:20 PM, Jacek Laskowski <ja...@japila.pl> wrote: > >> Hi, >> >> You seem to be using yarn. Is this cluster or client deploy mode? Have >> you seen any other exceptions before? How long did the application run >> before the exception? >> >> Pozdrawiam, >> Jacek Laskowski >> ---- >> https://medium.com/@jaceklaskowski/ >> Mastering Apache Spark http://bit.ly/mastering-apache-spark >> Follow me at https://twitter.com/jaceklaskowski >> >> >> On Mon, Jul 4, 2016 at 10:57 AM, kishore kumar <akishore...@gmail.com> >> wrote: >> > We've upgraded spark version from 1.2 to 1.6 still the same problem, >> > >> > Exception in thread "main" org.apache.spark.SparkException: Job aborted >> due >> > to stage failure: Task 286 in stage >> > 2397.0 failed 4 times, most recent failure: Lost task 286.3 in stage >> 2397.0 >> > (TID 314416, salve-06.domain.com): java.io.FileNotFoundException: >> > /opt/mapr/tmp/h >> > >> adoop-tmp/hadoop-mapr/nm-local-dir/usercache/user1/appcache/application_1467474162580_29353/blockmgr-bd075392-19c2-4cb8-8033-0fe54d683c8f/12/shuffle_530_286_0.inde >> > x.c374502a-4cf2-4052-abcf-42977f1623d0 (No such file or directory) >> > >> > Kindly help me to get rid from this. >> > >> > On Sun, Jun 5, 2016 at 9:43 AM, kishore kumar <akishore...@gmail.com> >> wrote: >> >> >> >> Hi, >> >> >> >> Could anyone help me about this error ? why this error comes ? >> >> >> >> Thanks, >> >> KishoreKuamr. >> >> >> >> On Fri, Jun 3, 2016 at 9:12 PM, kishore kumar <akishore...@gmail.com> >> >> wrote: >> >>> >> >>> Hi Jeff Zhang, >> >>> >> >>> Thanks for response, could you explain me why this error occurs ? >> >>> >> >>> On Fri, Jun 3, 2016 at 6:15 PM, Jeff Zhang <zjf...@gmail.com> wrote: >> >>>> >> >>>> One quick solution is to use spark 1.6.1. >> >>>> >> >>>> On Fri, Jun 3, 2016 at 8:35 PM, kishore kumar <akishore...@gmail.com >> > >> >>>> wrote: >> >>>>> >> >>>>> Could anyone help me on this issue ? >> >>>>> >> >>>>> On Tue, May 31, 2016 at 8:00 PM, kishore kumar < >> akishore...@gmail.com> >> >>>>> wrote: >> >>>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> We installed spark1.2.1 in single node, running a job in >> yarn-client >> >>>>>> mode on yarn which loads data into hbase and elasticsearch, >> >>>>>> >> >>>>>> the error which we are encountering is >> >>>>>> Exception in thread "main" org.apache.spark.SparkException: Job >> >>>>>> aborted due to stage failure: Task 38 in stage 26800.0 failed 4 >> times, most >> >>>>>> recent failure: Lost task 38.3 in stage 26800.0 (TID 4990082, >> >>>>>> hdprd-c01-r04-03): java.io.FileNotFoundException: >> >>>>>> >> /opt/mapr/tmp/hadoop-tmp/hadoop-mapr/nm-local-dir/usercache/sparkuser/appcache/application_1463194314221_211370/spark-3cc37dc7-fa3c-4b98-aa60-0acdfc79c725/28/shuffle_8553_38_0.index >> >>>>>> (No such file or directory) >> >>>>>> >> >>>>>> any idea about this error ? >> >>>>>> -- >> >>>>>> Thanks, >> >>>>>> Kishore. >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> Thanks, >> >>>>> Kishore. >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Best Regards >> >>>> >> >>>> Jeff Zhang >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Thanks, >> >>> Kishore. >> >> >> >> >> >> >> >> >> >> -- >> >> Thanks, >> >> Kishore. >> > >> > >> > >> > >> > -- >> > Thanks, >> > Kishore. >> > > > > -- > Thanks, > Kishore. >