[GitHub] [kyuubi] lordk911 commented on issue #5236: [Bug] KyuubiSparkSQLExtension will cause shuffle write data size expand too much

via GitHub Sun, 03 Sep 2023 23:04:17 -0700


lordk911 commented on issue #5236:
URL: https://github.com/apache/kyuubi/issues/5236#issuecomment-1704662042


   @ulysses-you 
   1、I've change spark-default.conf to :
   spark.sql.extensions            
org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.apache.kyuubi.sql.KyuubiSparkSQLExtension
   2、then connect to kyuubi : 
    2.1） set spark.sql.optimizer.insertRepartitionBeforeWrite.enabled=true;
    2.2）execute test sql : insert into select 
    2.3)  before the InsertIntoHadoopFsRelationCommand there will be a Exchange 
node  with RoundRobinPartitioning
   
![image](https://github.com/apache/kyuubi/assets/19989300/a3946e7d-81dc-4abe-92f5-b01bba5e5bdd)
    2.4）about 20 minutes later I cancel the query , because the shuffle write 
data size become larger when I use spark3.2.3 with KyuubiSparkSQLExtension
    2.5)  set spark.sql.optimizer.insertRepartitionBeforeWrite.enabled=false;
    2.6)  execute test sql : insert into select again
    2.7)  sql finished with the same output datasize and file number as direct 
use of spark without KyuubiSparkSQLExtension.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [kyuubi] lordk911 commented on issue #5236: [Bug] KyuubiSparkSQLExtension will cause shuffle write data size expand too much

Reply via email to