dev 
 
Version:  Spark 2.1.1 , carbondata 1.1.1  hadoop 2.7.2 

test table: 
                xitest2  amount of data  2Billion , 
                xitemp2   amount of data  0   ,
               xitemp   amount of data  950

run  sql
cc.sql("update xitest2  a set ( 
a.qqnum,a.nick,a.age,a.gender,a.auth,a.qunnum)=(select 
b.qqnum,b.nick,b.age,b.gender,b.auth,b.qunnum from xitemp2 b where 
b.pkid=a.pkid)").show;
shuffle read  336.5 KB
shuffle write 336.5 KB
 
run  sql
cc.sql("update xitest2  a set ( 
a.qqnum,a.nick,a.age,a.gender,a.auth,a.qunnum)=(select 
b.qqnum,b.nick,b.age,b.gender,b.auth,b.qunnum from xitemp b where 
b.pkid=a.pkid)").show;
shuffle read 1224M
shuffle WRITE 2.4G

When update subquery and subquery data = 0 shuffle too large can be optimized


yixu2001

Reply via email to