maobaolong opened a new pull request, #2080:
URL: https://github.com/apache/incubator-uniffle/pull/2080
<!--
1. Title: [#<issue>] <type>(<scope>): <subject>
Examples:
- "[#123] feat(operator): support xxx"
- "[#233] fix: check null before access result in xxx"
- "[MINOR] refactor: fix typo in variable name"
- "[MINOR] docs: fix typo in README"
- "[#255] test: fix flaky test NameOfTheTest"
Reference: https://www.conventionalcommits.org/en/v1.0.0/
2. Contributor guidelines:
https://github.com/apache/incubator-uniffle/blob/master/CONTRIBUTING.md
3. If the PR is unfinished, please mark this PR as draft.
-->
### What changes were proposed in this pull request?
Update spark.rss.client.bitmap.splitNum to 10
### Why are the changes needed?
Improve the performance of `reportShuffleResult` and
`getShuffleResultForMultiPart` operation.
### Does this PR introduce _any_ user-facing change?
Yes, default value changed and performance fly.
### How was this patch tested?
- Prepare a 10TB TPC-DS dataset and hive table.
- execute the following sql which will generated a huge amount of shuffle
data, about 17TB.
```SQL
"select SUM(IFNULL(CAST(ss_sold_time_sk AS DECIMAL(10, 2)), 0) +
IFNULL(CAST(ss_item_sk AS DECIMAL(10, 2)), 0) + IFNULL(CAST(ss_cdemo_sk AS
DECIMAL(10, 2)), 0) + IFNULL(CAST(ss_hdemo_sk AS DECIMAL(10, 2)), 0) +
IFNULL(CAST(ss_addr_sk AS DECIMAL(10, 2)), 0) + IFNULL(CAST(ss_store_sk AS
DECIMAL(10, 2)), 0) + IFNULL(CAST(ss_promo_sk AS DECIMAL(10, 2)), 0) +
IFNULL(CAST(ss_ticket_number AS DECIMAL(10, 2)), 0) + IFNULL(CAST(ss_quantity
AS DECIMAL(10, 2)), 0) + IFNULL(ss_wholesale_cost, 0) + IFNULL(ss_list_price,
0) + IFNULL(ss_sales_price, 0) + IFNULL(ss_ext_discount_amt, 0) +
IFNULL(ss_ext_sales_price, 0) + IFNULL(ss_ext_wholesale_cost, 0) +
IFNULL(ss_ext_list_price, 0) + IFNULL(ss_ext_tax, 0) + IFNULL(ss_coupon_amt, 0)
+ IFNULL(ss_net_paid, 0) + IFNULL(ss_net_paid_inc_tax, 0) +
IFNULL(ss_net_profit, 0)) as sum_all_fields from (select * from (select s.*,c.*
from (select *,floor(rand(123)*82857000) as sr from store_sales) s join
(select*,floor(rand(123)*82857000)as cr from customer) c
on s.sr=c.cr) sc DISTRIBUTE BY sc.ss_customer_sk,sc.ss_item_sk)"
```
- With
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]