boneanxs commented on PR #11059:
URL: 
https://github.com/apache/incubator-gluten/pull/11059#issuecomment-3530591528

   > @boneanxs Thanks for following.
   > 
   > > I tested it and found that it still produces less shuffle data than 
rss_sort.
   > 
   > Could you also compare with vanilla spark + celeborn?
   > 
   > > I can add more tests to measure how much the data volume increases when 
compared to buffering the entire partition.
   > 
   > It would be nice if there are more performance results to be shared.
   
   
   @marin-ma I ran a test query on TPC-DS (3TB) using the following SQL:
   
   ```sql
   SELECT
     ss.ss_customer_sk,
     ss.ss_item_sk,
     ss.ss_ticket_number,
     ss.ss_store_sk,
     ss.ss_promo_sk,
     ss.ss_sold_date_sk,
     c.c_customer_id,
     c.c_first_name,
     c.c_last_name,
     SUM(ss.ss_net_paid) AS total_paid
   FROM 
   (SELECT /*+ REPARTITION(50)*/ * from store_sales) ss
   LEFT JOIN customer c
     ON ss.ss_customer_sk = c.c_customer_sk
   GROUP BY
     ss.ss_customer_sk,
     ss.ss_item_sk,
     ss.ss_ticket_number,
     ss.ss_store_sk,
     ss.ss_promo_sk,
     ss.ss_sold_date_sk,
     c.c_customer_id,
     c.c_first_name,
     c.c_last_name
   limit 100;
   ```
   
   When comparing the first shuffle stage, I didn’t observe any performance 
regression with this patch applied.
   
   1. Vanilla Spark
   <img width="422" height="529" alt="Screenshot 2025-11-14 at 10 35 33" 
src="https://github.com/user-attachments/assets/b5186272-cacb-4512-af76-c2ac75f36a41";
 />
   
   2. Buffering all partitions + 512m offheap
   <img width="405" height="520" alt="Screenshot 2025-11-14 at 10 37 34" 
src="https://github.com/user-attachments/assets/af2e419d-5eac-4311-abbb-1e4bdc804a12";
 />
   
   3. With this patch + 512m offheap
   <img width="402" height="520" alt="Screenshot 2025-11-14 at 10 38 38" 
src="https://github.com/user-attachments/assets/f1b8f6b8-d836-4380-9e37-6f73b8c4ca37";
 />
   
   4. With this patch + 256m off-heap (to force more spill)
   <img width="403" height="520" alt="Screenshot 2025-11-14 at 10 41 57" 
src="https://github.com/user-attachments/assets/073208c1-5467-43d1-aba7-56e1c5a5a6bf";
 />
   
   By counting the number of `VeloxCelebornColumnarShuffleWriter: Gluten 
shuffle writer: Trying to push` for the same partition, I can confirm there're 
more spill happens when reducing the memory.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to