jackylk commented on a change in pull request #3514: [FAQ]add faq for how to 
deal with trailing task
URL: https://github.com/apache/carbondata/pull/3514#discussion_r361909663
 
 

 ##########
 File path: docs/faq.md
 ##########
 @@ -227,6 +228,29 @@ This property will enable the DEBUG log for the 
CarbonLRUCache and UnsafeMemoryM
 **Note:** If  `Removed entry from InMemory LRU cache` are frequently observed 
in logs, you may have to increase the configured LRU size.
 
 To observe the LRU cache from heap dump, check the heap used by CarbonLRUCache 
class.
+
+## How to deal with the trailing task in query?
+
+During the tuning process, it may be found that a few tasks slow down the 
overall query progress.  If the amount of data processed is the same, people 
will naturally think about the impact of IO, CPU and network bandwidth. Usually 
these tests can't able to have a quick result. So we need a way to solve and 
deal with these problems more quickly. spark.locality.wait and 
spark.speculation configuration it's an attempt, which can make the task that 
executes overtime retry in other nodes as soon as possible, and finally the 
task that ends first will be used. This may lose some of the data locality, but 
the actual verification helps to reduce the time-consuming of the trailing task.
 
 Review comment:
   ```suggestion
   When tuning query performance, user may found that a few tasks slow down the 
overall query progress.  To improve performance in such case, user can set 
spark.locality.wait and spark.speculation=true to enable speculation in spark, 
which will launch multiple task and get the result the one of the task which is 
finished first. Besides, user can also consider following configurations to 
further improve performance in this case.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to