[GitHub] flink pull request #5277: [hotfix][docs]Review to reduce passive voice, impr...

greghogan Wed, 10 Jan 2018 12:57:07 -0800

Github user greghogan commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5277#discussion_r160793612
  
    --- Diff: docs/concepts/runtime.md ---
    @@ -88,40 +94,36 @@ By default, Flink allows subtasks to share slots even 
if they are subtasks of di
     they are from the same job. The result is that one slot may hold an entire 
pipeline of the
     job. Allowing this *slot sharing* has two main benefits:
     
    -  - A Flink cluster needs exactly as many task slots as the highest 
parallelism used in the job.
    -    No need to calculate how many tasks (with varying parallelism) a 
program contains in total.
    +  - A Flink cluster needs as many task slots as the highest parallelism 
used in the job.
    +    There's no need to calculate how many tasks (with varying parallelism) 
a program contains in total.
     
       - It is easier to get better resource utilization. Without slot sharing, 
the non-intensive
    -    *source/map()* subtasks would block as many resources as the resource 
intensive *window* subtasks.
    +    *source/map()* subtasks would block as many resources as the 
resource-intensive *window* subtasks.
         With slot sharing, increasing the base parallelism in our example from 
two to six yields full utilization of the
    -    slotted resources, while making sure that the heavy subtasks are 
fairly distributed among the TaskManagers.
    +    slotted resources, while making sure that the heavy subtasks are 
evenly distributed among the TaskManagers.
     
     <img src="../fig/slot_sharing.svg" alt="TaskManagers with shared Task 
Slots" class="offset" width="80%" />
     
    -The APIs also include a *[resource 
group](../dev/datastream_api.html#task-chaining-and-resource-groups)* mechanism 
which can be used to prevent undesirable slot sharing. 
    +The APIs also include a *[resource 
group](../dev/datastream_api.html#task-chaining-and-resource-groups)* mechanism 
which you can use to prevent undesirable slot sharing.
     
    -As a rule-of-thumb, a good default number of task slots would be the 
number of CPU cores.
    -With hyper-threading, each slot then takes 2 or more hardware thread 
contexts.
    +As a rule-of-thumb, a reasonable default number of task slots would be the 
number of CPU cores. With hyper-threading, each slot then takes 2 or more 
hardware thread contexts.
     
     {% top %}
     
     ## State Backends
     
    -The exact data structures in which the key/values indexes are stored 
depends on the chosen [state backend](../ops/state/state_backends.html). One 
state backend
    -stores data in an in-memory hash map, another state backend uses 
[RocksDB](http://rocksdb.org) as the key/value store.
    -In addition to defining the data structure that holds the state, the state 
backends also implement the logic to
    -take a point-in-time snapshot of the key/value state and store that 
snapshot as part of a checkpoint.
    +The exact data structures which store the key/values indexes depends on 
the chosen [state backend](../ops/state/state_backends.html). One state backend 
stores data in an in-memory hash map, another state backend uses 
[RocksDB](http://rocksdb.org) as the key/value store. In addition to defining 
the data structure that holds the state, the state backends also implement the 
logic to take a point-in-time snapshot of the key/value state and store that 
snapshot as part of a checkpoint.
     
     <img src="../fig/checkpoints.svg" alt="checkpoints and snapshots" 
class="offset" width="60%" />
     
     {% top %}
     
     ## Savepoints
     
    -Programs written in the Data Stream API can resume execution from a 
**savepoint**. Savepoints allow both updating your programs and your Flink 
cluster without losing any state. 
    +Programs written in the Data Stream API can resume execution from a 
**savepoint**. Savepoints allow both updating your programs and your Flink 
cluster without losing any state.
     
    -[Savepoints](../ops/state/savepoints.html) are **manually triggered 
checkpoints**, which take a snapshot of the program and write it out to a state 
backend. They rely on the regular checkpointing mechanism for this. During 
execution programs are periodically snapshotted on the worker nodes and produce 
checkpoints. For recovery only the last completed checkpoint is needed and 
older checkpoints can be safely discarded as soon as a new one is completed.
    +[Savepoints](../ops/state/savepoints.html) are **manually triggered 
checkpoints**, which take a snapshot of the program and write it out to a state 
backend. They rely on the regular checkpointing mechanism for this. During 
execution, programs are periodically snapshotted on the worker nodes and 
produce checkpoints. You only need the last completed checkpoint for recovery, 
and you can safely discard older checkpoints as soon as a new one is completed.
     
    -Savepoints are similar to these periodic checkpoints except that they are 
**triggered by the user** and **don't automatically expire** when newer 
checkpoints are completed. Savepoints can be created from the [command 
line](../ops/cli.html#savepoints) or when cancelling a job via the [REST 
API](../monitoring/rest_api.html#cancel-job-with-savepoint).
    +Savepoints are similar to these periodic checkpoints except that they are 
**triggered by the user** and **don't automatically expire** when newer 
checkpoints are completed. You can create savepoints can from the [command 
line](../ops/cli.html#savepoints) or when canceling a job via the [REST 
API](../monitoring/rest_api.html#cancel-job-with-savepoint).
    --- End diff --
    
    "savepoints can" -> "savepoints"

---

[GitHub] flink pull request #5277: [hotfix][docs]Review to reduce passive voice, impr...

Reply via email to