lordk911 commented on issue #5236: URL: https://github.com/apache/kyuubi/issues/5236#issuecomment-1713257122
> CREATE TABLE test.items > USING parquet > AS > SELECT id AS i_item_id, > CAST(rand() * 1000 AS INT) AS i_price > FROM RANGE(30000000); > > CREATE TABLE test.sales > USING parquet > AS > SELECT CASE WHEN rand() < 0.8 THEN 100 ELSE CAST(rand() * 30000000 AS INT) END AS s_item_id, > CAST(rand() * 100 AS INT) AS s_quantity, > DATE_ADD(current_date(), - CAST(rand() * 360 AS INT)) AS s_date > FROM RANGE(1000000000); > > create table IF NOT EXISTS test.aqe_kyuubi_extendtion( > s_date string, > total_sales long > )STORED AS parquet; > > set spark.sql.optimizer.insertRepartitionBeforeWrite.enabled=true; > > truncate table test.aqe_kyuubi_extendtion; > > insert into test.aqe_kyuubi_extendtion > SELECT s_date, s_quantity * i_price AS total_sales > FROM test.sales > JOIN test.items ON s_item_id = i_item_id; I think I've give out the example data above, When using KyuubiSparkSQLExtension, the number of output files is 69, with a total size of 2.7GB; without using KyuubiSparkSQLExtension, the number of output files is 49, with a total size of 2.5GB. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
