Re: Database insert happening two times

2017-10-17 Thread Harsh Choudhary
Hi @Marco, the multiple rows written are not dupes as current timestamp field is different in each of them. @Ayan I checked and found that my whole code is rerun twice. Although there seems to be no error, is it configurable to re-run by cluster manager? On Tue, Oct 17, 2017 at 6:45 PM, ayan

Re: Database insert happening two times

2017-10-17 Thread ayan guha
It should not be parallel exec as the logging code is called in driver. Have you checked if your driver is reran by cluster manager due to any failure or error situation> On Tue, Oct 17, 2017 at 11:52 PM, Marco Mistroni wrote: > Hi > Uh if the problem is really with

Re: Database insert happening two times

2017-10-17 Thread Marco Mistroni
Hi Uh if the problem is really with parallel exec u can try to call repartition(1) before u save Alternatively try to store data in a csv file and see if u have same behaviour, to exclude dynamodb issues Also ..are the multiple rows being written dupes (they have all same fields/values)? Hth On

Re: Database insert happening two times

2017-10-17 Thread Harsh Choudhary
This is the code - hdfs_path= if(hdfs_path.contains(".avro")){ data_df = spark.read.format("com.databricks.spark.avro").load(hdfs_path) }else if(hdfs_path.contains(".tsv")){ data_df = spark.read.option("delimiter","\t").option("header","true").csv(hdfs_path) }else

Re: Database insert happening two times

2017-10-17 Thread ayan guha
Can you share your code? On Tue, 17 Oct 2017 at 10:22 pm, Harsh Choudhary wrote: > Hi > > I'm running a Spark job in which I am appending new data into Parquet > file. At last, I make a log entry in my Dynamodb table stating the number > of records appended, time etc.

Database insert happening two times

2017-10-17 Thread Harsh Choudhary
Hi I'm running a Spark job in which I am appending new data into Parquet file. At last, I make a log entry in my Dynamodb table stating the number of records appended, time etc. Instead of one single entry in the database, multiple entries are being made to it. Is it because of parallel execution