boneanxs commented on PR #11059:
URL:
https://github.com/apache/incubator-gluten/pull/11059#issuecomment-3530591528
> @boneanxs Thanks for following.
>
> > I tested it and found that it still produces less shuffle data than
rss_sort.
>
> Could you also compare with vanilla spark + celeborn?
>
> > I can add more tests to measure how much the data volume increases when
compared to buffering the entire partition.
>
> It would be nice if there are more performance results to be shared.
@marin-ma I ran a test query on TPC-DS (3TB) using the following SQL:
```sql
SELECT
ss.ss_customer_sk,
ss.ss_item_sk,
ss.ss_ticket_number,
ss.ss_store_sk,
ss.ss_promo_sk,
ss.ss_sold_date_sk,
c.c_customer_id,
c.c_first_name,
c.c_last_name,
SUM(ss.ss_net_paid) AS total_paid
FROM
(SELECT /*+ REPARTITION(50)*/ * from store_sales) ss
LEFT JOIN customer c
ON ss.ss_customer_sk = c.c_customer_sk
GROUP BY
ss.ss_customer_sk,
ss.ss_item_sk,
ss.ss_ticket_number,
ss.ss_store_sk,
ss.ss_promo_sk,
ss.ss_sold_date_sk,
c.c_customer_id,
c.c_first_name,
c.c_last_name
limit 100;
```
When comparing the first shuffle stage, I didn’t observe any performance
regression with this patch applied.
1. Vanilla Spark
<img width="422" height="529" alt="Screenshot 2025-11-14 at 10 35 33"
src="https://github.com/user-attachments/assets/b5186272-cacb-4512-af76-c2ac75f36a41"
/>
2. Buffering all partitions + 512m offheap
<img width="405" height="520" alt="Screenshot 2025-11-14 at 10 37 34"
src="https://github.com/user-attachments/assets/af2e419d-5eac-4311-abbb-1e4bdc804a12"
/>
3. With this patch + 512m offheap
<img width="402" height="520" alt="Screenshot 2025-11-14 at 10 38 38"
src="https://github.com/user-attachments/assets/f1b8f6b8-d836-4380-9e37-6f73b8c4ca37"
/>
4. With this patch + 256m off-heap (to force more spill)
<img width="403" height="520" alt="Screenshot 2025-11-14 at 10 41 57"
src="https://github.com/user-attachments/assets/073208c1-5467-43d1-aba7-56e1c5a5a6bf"
/>
By counting the number of `VeloxCelebornColumnarShuffleWriter: Gluten
shuffle writer: Trying to push` for the same partition, I can confirm there're
more spill happens when reducing the memory.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]