why "cache table a as select * from b" will do shuffle,and create 2 stages.

 

example:

table "ods_pay_consume" is from "KafkaUtils.createDirectStream"

         hiveContext.sql("cache table dwd_pay_consume as select * from
ods_pay_consume")

this code will make 2 statges of DAG

 

         hiveContext.sql("cache table dw_game_server_recharge as select *
from dwd_pay_consume")

this code also will make 2 stages of DAG,and it is similar caculate from the
beginning for ther DAG Visualization tool,"cache table dwd_pay_consume" is
not effect.

 

Reply via email to