[
https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545125#comment-14545125
]
ASF GitHub Bot commented on FLINK-1525:
---------------------------------------
Github user uce commented on a diff in the pull request:
https://github.com/apache/flink/pull/664#discussion_r30391587
--- Diff: docs/apis/best_practices.md ---
@@ -0,0 +1,155 @@
+---
+title: "Best Practices"
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<a href="#top"></a>
+
+
+This page contains a collection of best practices for Flink programmers on
how to solve frequently encountered problems.
+
+
+* This will be replaced by the TOC
+{:toc}
+
+## Parsing command line arguments and passing them around in your Flink
application
+
+
+Almost all Flink applications, both batch and streaming rely on external
configuration parameters.
+For example for specifying input and output sources (like paths or
addresses), also system parameters (parallelism, runtime configuration) and
application specific parameters (often used within the user functions).
+
+Since version 0.9 we are providing a simple called `ParameterTool` to
provide at least some basic tooling for solving these problems.
+
+As you'll see Flink is very flexible when it comes to parsing input
parameters. You are free to choose any other framework, like [Commons
CLI](https://commons.apache.org/proper/commons-cli/),
[argparse4j](http://argparse4j.sourceforge.net/), or others.
+
+
+### Getting your configuration values into the `ParameterTool`
+
+The `ParameterTool` provides a set of predefined static methods for
reading the configuration. The tool is internally expecting a `Map<String,
String>`, so its very easy to integrate it with your own configuration style.
+
+
+#### From `.properties` files
+
+The following method will read a
[Properties](https://docs.oracle.com/javase/tutorial/essential/environment/properties.html)
file and provide the key/value pairs:
+{% highlight java %}
+String propertiesFile = "/home/sam/flink/myjob.properties";
+ParameterTool parameter = ParameterTool.fromPropertiesFile(propertiesFile);
+{% endhighlight %}
+
+
+#### From the command line arguments
+
+This allows getting arguments like `--input hdfs:///mydata --elements 42`
from the command line.
+{% highlight java %}
+public static void main(String[] args) {
+ ParameterTool parameter = ParameterTool.fromArgs(args);
+ // .. regular code ..
+{% endhighlight %}
+
+
+#### From system properties
+
+When starting a JVM, you can pass system properties to it:
`-Dinput=hdfs:///mydata`. You can also initialize the `ParameterTool` from
these system properties:
+
+{% highlight java %}
+ParameterTool parameter = ParameterTool.fromSystemProperties();
+{% endhighlight %}
+
+
+### Using the parameters in your Flink program
+
+Now that we've got the parameters from somewhere (see above) we can use
them in various ways.
+
+**Directly from the `ParameterTool`**
+
+The `ParameterTool` itself has methods for accessing the values.
+{% highlight java %}
+ParameterTool parameters = // ...
+parameter.getRequired("input");
+parameter.get("output", "myDefaultValue");
+parameter.getLong("expectedCount", -1L);
+parameter.getNumberOfParameters()
+// .. there are more methods available.
+{% endhighlight %}
+
+You can use the return values of these methods directly in the main()
method (=the client submitting the application).
+For example you could set the parallelism of a operator like this:
+
+{% highlight java %}
+ParameterTool parameters = ParameterTool.fromArgs(args);
+DataSet<Tuple2<String, Integer>> counts = text.flatMap(new
Tokenizer()).setParallelism(parameters.getInt("mapParallelism", 2));
--- End diff --
Maybe to make this patter readable... do
```
int parallelism = parameters.get("mapParallelism", 2);
counts = ....setParalellism(parallelism);
```
> Provide utils to pass -D parameters to UDFs
> --------------------------------------------
>
> Key: FLINK-1525
> URL: https://issues.apache.org/jira/browse/FLINK-1525
> Project: Flink
> Issue Type: Improvement
> Components: flink-contrib
> Reporter: Robert Metzger
> Assignee: Robert Metzger
> Labels: starter
>
> Hadoop users are used to setting job configuration through "-D" on the
> command line.
> Right now, Flink users have to manually parse command line arguments and pass
> them to the methods.
> It would be nice to provide a standard args parser with is taking care of
> such stuff.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)