Re: Spark Batch checkpoint

2016-12-16 Thread Chawla,Sumit
sorry for hijacking this thread. @irving, how do you restart a spark job from checkpoint? Regards Sumit Chawla On Fri, Dec 16, 2016 at 2:24 AM, Selvam Raman wrote: > Hi, > > Acutally my requiremnt is read the parquet file which is 100 partition. > Then i use

Re: Spark Batch checkpoint

2016-12-16 Thread Selvam Raman
Hi, Acutally my requiremnt is read the parquet file which is 100 partition. Then i use foreachpartition to read the data and process it. My sample code public static void main(String[] args) { SparkSession sparkSession = SparkSession.builder().appName("checkpoint verification").getOrCreate();

Re: Spark Batch checkpoint

2016-12-15 Thread Selvam Raman
I am using java. I will try and let u know. On Dec 15, 2016 8:45 PM, "Irving Duran" wrote: > Not sure what programming language you are using, but in python you can do > "sc.setCheckpointDir('~/apps/spark-2.0.1-bin-hadoop2.7/checkpoint/')". > This will store checkpoints

Re: Spark Batch checkpoint

2016-12-15 Thread Irving Duran
Not sure what programming language you are using, but in python you can do " sc.setCheckpointDir('~/apps/spark-2.0.1-bin-hadoop2.7/checkpoint/')". This will store checkpoints on that directory that I called checkpoint. Thank You, Irving Duran On Thu, Dec 15, 2016 at 10:33 AM, Selvam Raman

Spark Batch checkpoint

2016-12-15 Thread Selvam Raman
Hi, is there any provision in spark batch for checkpoint. I am having huge data, it takes more than 3 hours to process all data. I am currently having 100 partitions. if the job fails after two hours, lets say it has processed 70 partition. should i start spark job from the beginning or is