Hi Team,

I've followed Spark community for several years. This is my first time for
asking help. I hope you guys can give some experience.

I want to develop a spark application with processing a sqlscript file. The
data is on BigQuery.
For example, the sqlscript is:

delete from tableA;
insert into tableA select b.columnB1, c.columnC2 from tableB b, tableC c;


I can parse this file. In my opinion, After parsing the file, steps should
follow these below:

#step1: read tableB, tableC into memory(Spark)
#step2. register views for tableB's dataframe and tableC's dataframe
#step3. use spark.sql("select b.columnB1, c.columnC2 from tableB b, tableC
c") to get a new dataframe
#step4. new dataframe.write().() to tableA using mode of "OVERWRITE"

My question:
#1 If there are 10 tables or more tables, do I need to read each table into
memory though Spark bases on memory compution?
#2 Is there a much easier way to deal with my scenarios, for example, I
just define the datasource(BigQuery) and just parse sqlscript file, others
are run by Spark.

Please share your experience or idea.

Reply via email to