[GitHub] [kyuubi] lordk911 commented on issue #5236: [Bug] KyuubiSparkSQLExtension will cause shuffle write data size expand too much

via GitHub Mon, 04 Sep 2023 04:16:24 -0700


lordk911 commented on issue #5236:
URL: https://github.com/apache/kyuubi/issues/5236#issuecomment-1705087975


   ```
   CREATE TABLE test.items
   USING parquet
   AS
   SELECT id AS i_item_id,
   CAST(rand() * 1000 AS INT) AS i_price
   FROM RANGE(30000000);
   
   CREATE TABLE test.sales
   USING parquet
   AS
   SELECT CASE WHEN rand() < 0.8 THEN 100 ELSE CAST(rand() * 30000000 AS INT) 
END AS s_item_id,
   CAST(rand() * 100 AS INT) AS s_quantity,
   DATE_ADD(current_date(), - CAST(rand() * 360 AS INT)) AS s_date
   FROM RANGE(1000000000);
   
   create table IF NOT EXISTS test.aqe_kyuubi_extendtion(
     s_date   string,
     total_sales long
   )STORED AS parquet;
   
   set spark.sql.optimizer.insertRepartitionBeforeWrite.enabled=true;
   
   truncate table test.aqe_kyuubi_extendtion;
   
   insert into test.aqe_kyuubi_extendtion 
   SELECT s_date, s_quantity * i_price AS total_sales
   FROM test.sales
   JOIN test.items ON s_item_id = i_item_id;
   ```
   
![image](https://github.com/apache/kyuubi/assets/19989300/b624c626-88bf-4b2b-9aaa-8a29bc455dc6)
   
   ```
   set spark.sql.optimizer.insertRepartitionBeforeWrite.enabled=false;
   
   truncate table test.aqe_kyuubi_extendtion;
   
   insert into test.aqe_kyuubi_extendtion 
   SELECT s_date, s_quantity * i_price AS total_sales
   FROM test.sales
   JOIN test.items ON s_item_id = i_item_id;
   ```
   
![image](https://github.com/apache/kyuubi/assets/19989300/d84b9aae-6e63-438f-bcad-ddaab3a6d59a)
   
   will this help? 
   also test with set 
spark.sql.optimizer.insertRepartitionBeforeWrite.enabled=true; and set 
spark.sql.optimizer.inferRebalanceAndSortOrders.enabled=true;   got the same 
result as just set 
spark.sql.optimizer.insertRepartitionBeforeWrite.enabled=true;
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [kyuubi] lordk911 commented on issue #5236: [Bug] KyuubiSparkSQLExtension will cause shuffle write data size expand too much

Reply via email to