[GitHub] [kyuubi] lordk911 commented on issue #5236: [Bug] KyuubiSparkSQLExtension will cause shuffle write data size expand too much

via GitHub Sun, 10 Sep 2023 23:38:36 -0700


lordk911 commented on issue #5236:
URL: https://github.com/apache/kyuubi/issues/5236#issuecomment-1713257122


   > CREATE TABLE test.items
   > USING parquet
   > AS
   > SELECT id AS i_item_id,
   > CAST(rand() * 1000 AS INT) AS i_price
   > FROM RANGE(30000000);
   > 
   > CREATE TABLE test.sales
   > USING parquet
   > AS
   > SELECT CASE WHEN rand() < 0.8 THEN 100 ELSE CAST(rand() * 30000000 AS INT) 
END AS s_item_id,
   > CAST(rand() * 100 AS INT) AS s_quantity,
   > DATE_ADD(current_date(), - CAST(rand() * 360 AS INT)) AS s_date
   > FROM RANGE(1000000000);
   > 
   > create table IF NOT EXISTS test.aqe_kyuubi_extendtion(
   >   s_date   string,
   >   total_sales long
   > )STORED AS parquet;
   > 
   > set spark.sql.optimizer.insertRepartitionBeforeWrite.enabled=true;
   > 
   > truncate table test.aqe_kyuubi_extendtion;
   > 
   > insert into test.aqe_kyuubi_extendtion 
   > SELECT s_date, s_quantity * i_price AS total_sales
   > FROM test.sales
   > JOIN test.items ON s_item_id = i_item_id;
   
   I think I've give out the example data above, 
   When using KyuubiSparkSQLExtension, the number of output files is 69, with a 
total size of 2.7GB; without using KyuubiSparkSQLExtension, the number of 
output files is 49, with a total size of 2.5GB.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [kyuubi] lordk911 commented on issue #5236: [Bug] KyuubiSparkSQLExtension will cause shuffle write data size expand too much

Reply via email to