[ 
https://issues.apache.org/jira/browse/FLINK-23952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404845#comment-17404845
 ] 

Xintong Song edited comment on FLINK-23952 at 8/26/21, 2:32 AM:
----------------------------------------------------------------

h3. Why it worked fine in 1.13.1 but not in 1.13.2

It is designed that the cpu cores and all memory sizes should be calculated 
before starting the java process, and they should be explicitly set via 
configuration options. Notice that this could overwrite existing 
configurations. E.g., the user may configure a [min, max] range for the network 
memory size, and Flink's automatic calculation logic should decide a specific 
value within that range and set both min/max config options to that value, 
making sure it stays consistent during the entire lifecycle of the process.

There are internally logics inside the task manager that rely on the assumption 
that all cpu/memory config options should be explicitly set. E.g., Flink uses 
the min value from configuration as the network memory size, expecting max 
should be configured to the same value. However, Flink did not check whether 
all such options are explicitly configured. That explains how your scripts 
worked fine in 1.13.1. Despite no serious problems were observed, the memory 
management may not worked as designed/expected, in terms of stability and 
resource efficiency.

h3. Running flink with custom scripts

If the build-in scripts do not satisfy your demands, it should work calling 
BashJavaUtils from your custom scripts. The key point is to calculate and 
configure the resources in advance and consistently as the other flink 
components expect. However, as [~chesnay] mentioned, there's no guarantee that 
these things will stay compatible in future releases. They can be changed 
anytime without notice, which means you may run into these kind of problems 
again in future.

Alternatively, you may consider to file jira tickets for your complaints about 
the build-in scripts. That would be an appreciated contribution for the 
community.

BTW, I think taskmanager.sh does not need to read FLINK_PLUGINS_DIR, because it 
is exported as an environment variable and is read directly by the java process.


was (Author: xintongsong):
h2. Why it worked fine in 1.13.1 but not in 1.13.2

It is designed that the cpu cores and all memory sizes should be calculated 
before starting the java process, and they should be explicitly set via 
configuration options. Notice that this could overwrite existing 
configurations. E.g., the user may configure a [min, max] range for the network 
memory size, and Flink's automatic calculation logic should decide a specific 
value within that range and set both min/max config options to that value, 
making sure it stays consistent during the entire lifecycle of the process.

There are internally logics inside the task manager that rely on the assumption 
that all cpu/memory config options should be explicitly set. E.g., Flink uses 
the min value from configuration as the network memory size, expecting max 
should be configured to the same value. However, Flink did not check whether 
all such options are explicitly configured. That explains how your scripts 
worked fine in 1.13.1. Despite no serious problems were observed, the memory 
management may not worked as designed/expected, in terms of stability and 
resource efficiency.

h2. Running flink with custom scripts

If the build-in scripts do not satisfy your demands, it should work calling 
BashJavaUtils from your custom scripts. The key point is to calculate and 
configure the resources in advance and consistently as the other flink 
components expect. However, as [~chesnay] mentioned, there's no guarantee that 
these things will stay compatible in future releases. They can be changed 
anytime without notice, which means you may run into these kind of problems 
again in future.

Alternatively, you may consider to file jira tickets for your complaints about 
the build-in scripts. That would be an appreciated contribution for the 
community.

BTW, I think taskmanager.sh does not need to read FLINK_PLUGINS_DIR, because it 
is exported as an environment variable and is read directly by the java process.

> Taskmanager fails to start complaining about missing configuration option
> -------------------------------------------------------------------------
>
>                 Key: FLINK-23952
>                 URL: https://issues.apache.org/jira/browse/FLINK-23952
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Configuration
>    Affects Versions: 1.13.2
>            Reporter: Leonid Ilyevsky
>            Priority: Major
>         Attachments: flink-conf.yaml, taskmanager.log, taskmanager_start.txt
>
>
> Taskmanager now fails to start, after I upgraded to 1.13.2. It worked fine in 
> 1.13.1.
> It suddenly started complaining about missing configuration options that are 
> not really required, according to documentation. When I tried to set the one 
> it complained about, it started complaining about another one.
>  
> Please see attached files:
> taskmanager_start.txt - actual command that is used to start the program
> flink-conf.yaml - configuration file
> taskmanager.log - logfile where you can see the exception
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to