[
https://issues.apache.org/jira/browse/FLINK-30133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17637116#comment-17637116
]
Gabor Somogyi commented on FLINK-30133:
---------------------------------------
I think this area is just conceptually not consistent so not sure what we can
do about it w/o breaking change.
Here is my understanding:
* All other factory classes make the workload finally fail if something bad
happens
* `security.module.factory.classes` contains `HadoopModuleFactory` by default
which is fine
* When no hadoop-common is on classpath then it silently prints an info and
not loading the module. We can consider it [best effort
behavior|https://github.com/apache/flink/blob/26aa543b3bbe2b606bbc6d332a2ef7c5b46d25eb/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModuleFactory.java#L41].
* Then it tries to load the hadoop configuration in the [mentioned
place|https://github.com/apache/flink/blob/26aa543b3bbe2b606bbc6d332a2ef7c5b46d25eb/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModuleFactory.java#L51],
but this is just bad in general. It uses `flink-hadoop-fs` area code where
[HdfsConfiguration|https://github.com/apache/flink/blob/26aa543b3bbe2b606bbc6d332a2ef7c5b46d25eb/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/util/HadoopUtils.java#L59]
is instantiated. This implicitly means one MUST have HDFS jars on classpath to
run Flink securely. I'm constantly thinking about this to refactor but this is
definitely a breaking change ( though I would support full rewrite of hadoop
config loading since Flink has at least 6-7 different Hadoop config loading
implementations which makes this area hell :) ).
Considering the actual situation we can decrease the error level to warning
since the workload is going forward.
If you ask me then now/later on I would do the following for the clean solution:
* Remove `HdfsConfiguration` from Flink Hadoop config loading since HDFS
configs are not needed for Kerberos authentication.
* Make the workload finally fail if module was not able to be loaded/installed
(hadoop-common is on classpath so the user has intention to install the module)
I know that my clean solution would be a drastic change but that would be clear
to the users.
> HadoopModuleFactory creates error if the security module cannot be loaded
> -------------------------------------------------------------------------
>
> Key: FLINK-30133
> URL: https://issues.apache.org/jira/browse/FLINK-30133
> Project: Flink
> Issue Type: Bug
> Components: Connectors / Hadoop Compatibility
> Affects Versions: 1.16.0, 1.17.0, 1.15.2
> Reporter: Matthias Pohl
> Priority: Minor
> Labels: starter
>
> [HadoopModuleFactory|https://github.com/apache/flink/blob/26aa543b3bbe2b606bbc6d332a2ef7c5b46d25eb/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModuleFactory.java#L51]
> tries to load the {{{}HadoopModule{}}}. If it fails to load the module, it
> will log an error an return {{null}} which is going to be handled properly.
> The resulting error log is, therefore, confusing. We might want to lower the
> log level to warning since the error doesn't affect the Flink cluster in a
> fatal way.
> We might want to make the cluster fail fatally if we consider this a sever
> usability problem.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)