Hi, 

I am iteratively receiving a file which can only be opened as a Pandas
dataframe. For the first such file I receive, I am converting this to a
Spark dataframe using the 'createDataframe' utility function. The next file
onward, I am converting it and union'ing it into the first Spark
dataframe(the schema always stays the same). After each union, I am
persisting it in memory(MEMORY_AND_DISK_ONLY level). After I have converted
all such files to a single spark dataframe I am coalescing it. Following
some tips from this Stack Overflow
post(https://stackoverflow.com/questions/39381183/managing-spark-partitions-after-dataframe-unions).
   

Any suggestions for optimizing this process further?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Merging-multiple-Pandas-dataframes-tp28770.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to