This is an automated email from the ASF dual-hosted git repository.
jackylk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git
The following commit(s) were added to refs/heads/master by this push:
new 1ded13e [DOC][FAQ] add faq for how to deal with slow task
1ded13e is described below
commit 1ded13efa0a00f9b04b0714292aedc738b2f2d8d
Author: litao <[email protected]>
AuthorDate: Wed Dec 18 20:25:23 2019 +0800
[DOC][FAQ] add faq for how to deal with slow task
This closes #3514
---
docs/faq.md | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/docs/faq.md b/docs/faq.md
index 9ba7082..16cdfa5 100644
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -29,6 +29,7 @@
* [Why all executors are showing success in Spark UI even after Dataload
command failed at Driver
side?](#why-all-executors-are-showing-success-in-spark-ui-even-after-dataload-command-failed-at-driver-side)
* [Why different time zone result for select query output when query SDK
writer
output?](#why-different-time-zone-result-for-select-query-output-when-query-sdk-writer-output)
* [How to check LRU cache memory
footprint?](#how-to-check-lru-cache-memory-footprint)
+* [How to deal with the trailing task in
query?](#How-to-deal-with-the-trailing-task-in-query)
# TroubleShooting
@@ -227,6 +228,29 @@ This property will enable the DEBUG log for the
CarbonLRUCache and UnsafeMemoryM
**Note:** If `Removed entry from InMemory LRU cache` are frequently observed
in logs, you may have to increase the configured LRU size.
To observe the LRU cache from heap dump, check the heap used by CarbonLRUCache
class.
+
+## How to deal with the trailing task in query?
+
+When tuning query performance, user may found that a few tasks slow down the
overall query progress. To improve performance in such case, user can set
spark.locality.wait and spark.speculation=true to enable speculation in spark,
which will launch multiple task and get the result the one of the task which is
finished first. Besides, user can also consider following configurations to
further improve performance in this case.
+
+**Example:**
+
+```
+spark.locality.wait = 500
+spark.speculation = true
+spark.speculation.quantile = 0.75
+spark.speculation.multiplier = 5
+spark.blacklist.enabled = false
+```
+
+**Note:**
+
+spark.locality control data locality the value of 500 is used to shorten the
waiting time of spark.
+
+spark.speculation is a group of configuration, that can monitor trailing tasks
and start new tasks when conditions are met.
+
+spark.blacklist.enabled, avoid reduction of available executors due to
blacklist mechanism.
+
## Getting tablestatus.lock issues When loading data
**Symptom**