[GitHub] flink pull request #3259: Documentation: Production readiness checklist

alpinegizmo Fri, 03 Feb 2017 08:55:58 -0800

Github user alpinegizmo commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3259#discussion_r99376685
  
    --- Diff: docs/ops/production_ready.md ---
    @@ -0,0 +1,88 @@
    +---
    +title: "Production Readiness Checklist"
    +nav-parent_id: setup
    +nav-pos: 20
    +---
    +<!--
    +Licensed to the Apache Software Foundation (ASF) under one
    +or more contributor license agreements.  See the NOTICE file
    +distributed with this work for additional information
    +regarding copyright ownership.  The ASF licenses this file
    +to you under the Apache License, Version 2.0 (the
    +"License"); you may not use this file except in compliance
    +with the License.  You may obtain a copy of the License at
    +
    +  http://www.apache.org/licenses/LICENSE-2.0
    +
    +Unless required by applicable law or agreed to in writing,
    +software distributed under the License is distributed on an
    +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +KIND, either express or implied.  See the License for the
    +specific language governing permissions and limitations
    +under the License.
    +-->
    +
    +* ToC
    +{:toc}
    +
    +## Production Readiness Checklist
    +
    +Purpose of this production readiness checklist is to provide a condensed 
overview of configuration options that are
    +important and need **careful considerations** if you plan to bring your 
Flink job into **production**. For most of these options
    +Flink provides out-of-the-box defaults to make usage and adoption of Flink 
easier. For many users and scenarios, those
    +defaults are good starting points for development and completely 
sufficient for "one-shot" jobs. 
    +
    +However, once you are planning to bring a Flink appplication to production 
the requirements typically increase. For example,
    +you want your job to be (re-)scalable and to have a good upgrade story for 
your job and new Flink versions.
    +
    +In the following, we present a collection of configuration options that 
you should check before your job goes into production.
    +
    +### Set maximum parallelism for operators explicitly
    +
    +Maximum parallelism is a configuration parameter that is newly introduced 
in Flink 1.2 and has important implications
    +for the (re-)scalability of your Flink job. This parameter, which can be 
set on a per-job and/or per-operator granularity,
    +determines the maximum parallelism to which you can scale operators. It is 
important to understand that (as of now) there
    +is **now way to increase** this parameter after your job was initially 
started, except for restarting your job completely 
    +from scratch (i.e. with a new state, and not from a previous 
checkpoint/savepoint). Even if Flink would provide some way
    +to change maximum parallelism for existing savepoints in the future, you 
can already assume that for large states this is 
    +likely a long running operation that you want to avoid. At this point, you 
might wonder why not just to use a very high
    +value as default for this parameter. The reason behind this is that high 
maximum parallelism can have some impact on your
    +applications performance and even state sizes, because Flink has to 
maintain certain meta data for it's ability to rescale which
    +can increase with the maximum parallelism. In general, you should chose a 
max parallelism that is high enough to fit your
    +future needs in scalability, but keeping it as low as possible can give 
slightly better performance. In particular,
    +a maximum parallelism higher that 128 will typically result in slightly 
bigger state snapshots from the keyed backends.
    +
    +Notice that maximum parallelism must fulfill the following conditions:
    +
    +`0 < parallelism  <= max parallelism <= 2^15`
    +
    +You can set the maximum parallelism by `setMaxParallelism(int 
maxparallelism)`. By default, Flink will chose the maximum
    +parallelism as a function of the parallelism when the job is first started:
    +
    +- `128` : for all parallelism <= 128.
    +- `MIN(nextPowerOfTwo(parallelism + (parallelism / 2)), 2^15)` : for all 
parallelism > 128.
    +
    +### Set UUIDs for operators
    +
    +As mentioned in the documentation for [savepoints]({{ site.baseurl 
}}/setup/savepoints.html, users should set uids for
    +operators. Those operator uids are important for Flink's mapping of 
operator states to operators which, in turn, is 
    +essential for savepoints. By default operator uids are generated by 
traversing the JobGraph and hashing certain operator 
    +properties. While this is comfortable from a user perspective, it is also 
very fragile to changes on the JobGraph (e.g.
    +if you want to exchange an operator). To establish a stable mapping, we 
need stable operator uids provided by the user
    +through `setUid(String uid)`.
    +
    +### Choice of state backend
    +
    +Currently, Flink has the limitation that it can only restore the state 
from a savepoint for the same state backend that
    +took the savepoint. For example, this means that we can not take a 
savepoint with a memory state backend, then change
    +the job to use RocksDB state backend and restore. While we are planning to 
make backends interoperable in the near
    --- End diff --
    
    to use a RocksDB



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3259: Documentation: Production readiness checklist

Reply via email to