[carbondata] branch master updated: [DOC][FAQ] add faq for how to deal with slow task

jackylk Mon, 30 Dec 2019 01:42:20 -0800

This is an automated email from the ASF dual-hosted git repository.

jackylk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git



The following commit(s) were added to refs/heads/master by this push:
     new 1ded13e  [DOC][FAQ] add faq for how to deal with slow task
1ded13e is described below

commit 1ded13efa0a00f9b04b0714292aedc738b2f2d8d
Author: litao <litao_xid...@126.com>
AuthorDate: Wed Dec 18 20:25:23 2019 +0800

    [DOC][FAQ] add faq for how to deal with slow task
    
    This closes #3514
---
 docs/faq.md | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/docs/faq.md b/docs/faq.md
index 9ba7082..16cdfa5 100644
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -29,6 +29,7 @@
 * [Why all executors are showing success in Spark UI even after Dataload 
command failed at Driver 
side?](#why-all-executors-are-showing-success-in-spark-ui-even-after-dataload-command-failed-at-driver-side)
 * [Why different time zone result for select query output when query SDK 
writer 
output?](#why-different-time-zone-result-for-select-query-output-when-query-sdk-writer-output)
 * [How to check LRU cache memory 
footprint?](#how-to-check-lru-cache-memory-footprint)
+* [How to deal with the trailing task in 
query?](#How-to-deal-with-the-trailing-task-in-query)
 
 # TroubleShooting
 
@@ -227,6 +228,29 @@ This property will enable the DEBUG log for the 
CarbonLRUCache and UnsafeMemoryM
 **Note:** If  `Removed entry from InMemory LRU cache` are frequently observed 
in logs, you may have to increase the configured LRU size.
 
 To observe the LRU cache from heap dump, check the heap used by CarbonLRUCache 
class.
+
+## How to deal with the trailing task in query?
+
+When tuning query performance, user may found that a few tasks slow down the 
overall query progress.  To improve performance in such case, user can set 
spark.locality.wait and spark.speculation=true to enable speculation in spark, 
which will launch multiple task and get the result the one of the task which is 
finished first. Besides, user can also consider following configurations to 
further improve performance in this case.
+
+**Example:**
+
+```
+spark.locality.wait = 500
+spark.speculation = true
+spark.speculation.quantile = 0.75
+spark.speculation.multiplier = 5
+spark.blacklist.enabled = false
+```
+
+**Note:** 
+
+spark.locality control data locality the value of 500 is used to shorten the 
waiting time of spark. 
+
+spark.speculation is a group of configuration, that can monitor trailing tasks 
and start new tasks when conditions are met.
+
+spark.blacklist.enabled, avoid reduction of available executors due to 
blacklist mechanism.
+
 ## Getting tablestatus.lock issues When loading data
 
   **Symptom**

[carbondata] branch master updated: [DOC][FAQ] add faq for how to deal with slow task

Reply via email to