[GitHub] spark pull request: MetadataCleaner - fine control cleanup documen...

mridulm Thu, 13 Mar 2014 03:33:13 -0700

Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/89#discussion_r10559136
  
    --- Diff: docs/configuration.md ---
    @@ -487,6 +477,88 @@ Apart from these, the following properties are also 
available, and may be useful
     </tr>
     </table>
     
    +
    +The following are the properties that can be used to schedule cleanup jobs 
at different levels.
    +The below mentioned metadata tuning parameters should be set with a lot of 
consideration and only where required.
    +Scheduling metadata cleaning in the middle of job can result in a lot of 
unnecessary re-computations.
    +
    +<table class="table">
    +<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
    +<tr>
    +  <td>spark.cleaner.ttl</td>
    +  <td>(infinite)</td>
    +  <td>
    +    Duration (seconds) of how long Spark will remember any metadata 
(stages generated, tasks generated, etc.).
    +    Periodic cleanups will ensure that metadata older than this duration 
will be forgetten. This is
    +    useful for running Spark for many hours / days (for example, running 
24/7 in case of Spark Streaming
    +    applications). Note that any RDD that persists in memory for more than 
this duration will be cleared as well.
    +  </td>
    +</tr>
    +<tr>
    +  <td>spark.cleaner.ttl.MAP_OUTPUT_TRACKER</td>
    +  <td>spark.cleaner.ttl, with a min. value of 10 secs</td>
    +  <td>
    +    Cleans up the map containing the information of the mapper (the input 
block manager Id and the output result size) corresponding to a shuffle Id.
    +  </td>
    --- End diff --
    
    you might want to add that this takes precedence over spark.cleaner.ttl



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: MetadataCleaner - fine control cleanup documen...

Reply via email to