(incubator-uniffle) branch master updated: [#1898] improvement(docs): Add docs for malloc recommendation (#1899)

rickyma Mon, 15 Jul 2024 00:49:53 -0700

This is an automated email from the ASF dual-hosted git repository.

rickyma pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-uniffle.git



The following commit(s) were added to refs/heads/master by this push:
     new 64890eb19 [#1898] improvement(docs): Add docs for malloc 
recommendation (#1899)
64890eb19 is described below

commit 64890eb19a6b690409fcb687093c9850d90a24a6
Author: RickyMa <[email protected]>
AuthorDate: Mon Jul 15 15:48:37 2024 +0800

    [#1898] improvement(docs): Add docs for malloc recommendation (#1899)
    
    ### What changes were proposed in this pull request?
    
    Add docs for malloc recommendation.
    
    ### Why are the changes needed?
    
    For https://github.com/apache/incubator-uniffle/issues/1898.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    No need.
---
 docs/client_guide/spark_client_guide.md |  2 +-
 docs/server_guide.md                    | 14 +++++++++++++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/docs/client_guide/spark_client_guide.md 
b/docs/client_guide/spark_client_guide.md
index b08a1bdf4..3e504b51e 100644
--- a/docs/client_guide/spark_client_guide.md
+++ b/docs/client_guide/spark_client_guide.md
@@ -80,7 +80,7 @@ The important configuration is listed as following.
 
 | Property Name                                         | Default | 
Description                                                                     
                                                                                
                                                                                
                                                                                
                                                                      |
 
|-------------------------------------------------------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| spark.rss.writer.buffer.spill.size                    | 128m    | Buffer 
size for total partition data. It is recommended to set 
spark.rss.writer.buffer.spill.size to 512m (default is 128m, 1g is preferable, 
theoretically the larger the better, but the executor's own memory should be 
considered, it may cause OOM when the executor's memory is not enough), This 
configuration can effectively improve task performance and alleviate 
server-side GC pressure. |
+| spark.rss.writer.buffer.spill.size                    | 128m    | Buffer 
size for total partition data. It is recommended to set 
spark.rss.writer.buffer.spill.size to 512m (default is 128m, 1g is preferable, 
theoretically the larger the better, but the executor's own memory should be 
considered, it may cause OOM when the executor's memory is not enough), this 
configuration can effectively improve task performance and alleviate 
server-side GC pressure. |
 | spark.rss.client.send.size.limit                      | 16m     | The max 
data size sent to shuffle server                                                
                                                                                
                                                                                
                                                                                
                                                              |
 | spark.rss.client.unregister.thread.pool.size          | 10      | The max 
size of thread pool of unregistering                                            
                                                                                
                                                                                
                                                                                
                                                              |
 | spark.rss.client.unregister.request.timeout.sec       | 10      | The max 
timeout sec when doing unregister to remote shuffle-servers                     
                                                                                
                                                                                
                                                                                
                                                              |
diff --git a/docs/server_guide.md b/docs/server_guide.md
index 1dcff9eaa..fa509bc32 100644
--- a/docs/server_guide.md
+++ b/docs/server_guide.md
@@ -152,7 +152,7 @@ When enabling Netty, we should also consider memory related 
configurations.
 - Reserve about `15%` of the machine's memory space (reserved space for OS 
slab, reserved, cache, buffer, kernel stack, etc.)
 - Recommended ratio for heap memory : off-heap memory is `1 : 9`
 - `rss.server.buffer.capacity` + `rss.server.read.buffer.capacity` + reserved 
= maximum off-heap memory
-- Recommended ratio for capacity configurations: 
`rss.server.read.buffer.capacity` : `rss.server.buffer.capacity` = 1 : 18
+- Recommended ratio for capacity configurations: 
`rss.server.read.buffer.capacity` : `rss.server.buffer.capacity` = `1 : 18`
 
 Note: The reserved memory can be adjusted according to the actual situation, 
if the memory is relatively small, configuring 1g is completely sufficient.
 
@@ -213,3 +213,15 @@ rss.server.max.concurrency.of.per-partition.write 30
 rss.server.huge-partition.size.threshold 20g
 rss.server.huge-partition.memory.limit.ratio 0.2
 ```
+
+#### Malloc Recommendation
+
+We recommend using [mimalloc 2.x](https://github.com/microsoft/mimalloc). 
Through our tests, we found that when the off-heap memory is large (>= 300g) 
and the server is under high concurrent pressure, mimalloc performs better than 
glibc (the default malloc for most Linux systems), jemalloc, and TCmalloc. It 
has the lowest peak value of RSS (Resident Set Size) memory, can return memory 
to the operating system faster, and reduce memory fragmentation. This helps 
avoid issues of the server b [...]
+
+If you still find that your server's RSS memory is growing too fast and 
returning memory to the operating system is slow after using mimalloc, 
congratulations! This means your server is fully utilized and the request 
pressure is quite high. 
+
+In this case, you can set the following parameters to allow mimalloc to return 
memory to the operating system at the fastest speed:
+
+```
+export MIMALLOC_PURGE_DELAY=0
+```
\ No newline at end of file

(incubator-uniffle) branch master updated: [#1898] improvement(docs): Add docs for malloc recommendation (#1899)

Reply via email to