[
https://issues.apache.org/jira/browse/FLINK-21768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann closed FLINK-21768.
---------------------------------
Fix Version/s: 1.13.0
Resolution: Fixed
Fixed via f3155e6c0213de7bf4b58a89fb1e1331dee7701a
> Optimize system.exit() logic of CliFrontend
> -------------------------------------------
>
> Key: FLINK-21768
> URL: https://issues.apache.org/jira/browse/FLINK-21768
> Project: Flink
> Issue Type: Improvement
> Components: Command Line Client
> Reporter: Junfan Zhang
> Assignee: Junfan Zhang
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.13.0
>
>
> h2. Why
> We encounter a problem when Oozie integerated with Flink Batch Action.
> Oozie will use a launcher job to start Flink client used to submit Flink job
> to Hadoop Yarn.
> And when Flink client finished , Oozie will get its exitCode to determine job
> submission status and then do some extra things.
> So how Oozie catch {{System.exit()}}? It will implement JDK SecurityManager.
> ([Oozie related code
> link|https://github.com/apache/oozie/blob/f1e01a9e155692aa5632f4573ab1b3ebeab7ef45/sharelib/oozie/src/main/java/org/apache/oozie/action/hadoop/security/LauncherSecurityManager.java#L24]).
>
> Now when Flink Client finished successfully, it will call
> {{System.exit(0)}}([Flink related code
> link|https://github.com/apache/flink/blob/195298aea327b3f98d9852121f0f146368696300/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L1133])
> method.
> And then JVM will use LauncherSecurityManager(Oozie implemented) to handle
> {{System.exit(0)}} method and trigger {{LauncherSecurityManager.checkExit()}}
> method, and then will throw exception.
> Finally Flink Client will catch its {{throwable}} and call
> {{System.exit(31)}}([related code
> link|https://github.com/apache/flink/blob/195298aea327b3f98d9852121f0f146368696300/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L1139])
> method again. It will cause Oozie to misjudge the status of the Fllink job.
> Actually it's a corner case. In most scenes, the situation I mentioned will
> not happen. But it's still necessary for us to optimize client exit logic.
> Besides, i think the problem above may also exist in some other frameworks
> such as linkedin/azakaban and apache/airflow, which are using Flink client to
> submit batch job.
> Flink related code:
> {code:java}
> public static void main(final String[] args) {
> EnvironmentInformation.logEnvironmentInfo(LOG, "Command Line Client",
> args);
> // 1. find the configuration directory
> final String configurationDirectory =
> getConfigurationDirectoryFromEnv();
> // 2. load the global configuration
> final Configuration configuration =
> GlobalConfiguration.loadConfiguration(configurationDirectory);
> // 3. load the custom command lines
> final List<CustomCommandLine> customCommandLines =
> loadCustomCommandLines(configuration, configurationDirectory);
> try {
> final CliFrontend cli = new CliFrontend(configuration,
> customCommandLines);
> SecurityUtils.install(new
> SecurityConfiguration(cli.configuration));
> int retCode =
> SecurityUtils.getInstalledContext().runSecured(() ->
> cli.parseAndRun(args));
> System.exit(retCode);
> } catch (Throwable t) {
> final Throwable strippedThrowable =
> ExceptionUtils.stripException(t,
> UndeclaredThrowableException.class);
> LOG.error("Fatal error while running command line interface.",
> strippedThrowable);
> strippedThrowable.printStackTrace();
> System.exit(31);
> }
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)