Hi Team, I've followed Spark community for several years. This is my first time for asking help. I hope you guys can give some experience.
I want to develop a spark application with processing a sqlscript file. The data is on BigQuery. For example, the sqlscript is: delete from tableA; insert into tableA select b.columnB1, c.columnC2 from tableB b, tableC c; I can parse this file. In my opinion, After parsing the file, steps should follow these below: #step1: read tableB, tableC into memory(Spark) #step2. register views for tableB's dataframe and tableC's dataframe #step3. use spark.sql("select b.columnB1, c.columnC2 from tableB b, tableC c") to get a new dataframe #step4. new dataframe.write().() to tableA using mode of "OVERWRITE" My question: #1 If there are 10 tables or more tables, do I need to read each table into memory though Spark bases on memory compution? #2 Is there a much easier way to deal with my scenarios, for example, I just define the datasource(BigQuery) and just parse sqlscript file, others are run by Spark. Please share your experience or idea.