[ 
https://issues.apache.org/jira/browse/FLINK-30133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17637116#comment-17637116
 ] 

Gabor Somogyi commented on FLINK-30133:
---------------------------------------

I think this area is just conceptually not consistent so not sure what we can 
do about it w/o breaking change.

Here is my understanding:
 * All other factory classes make the workload finally fail if something bad 
happens
 * `security.module.factory.classes` contains `HadoopModuleFactory` by default 
which is fine
 * When no hadoop-common is on classpath then it silently prints an info and 
not loading the module. We can consider it [best effort 
behavior|https://github.com/apache/flink/blob/26aa543b3bbe2b606bbc6d332a2ef7c5b46d25eb/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModuleFactory.java#L41].
 * Then it tries to load the hadoop configuration in the [mentioned 
place|https://github.com/apache/flink/blob/26aa543b3bbe2b606bbc6d332a2ef7c5b46d25eb/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModuleFactory.java#L51],
 but this is just bad in general. It uses `flink-hadoop-fs` area code where 
[HdfsConfiguration|https://github.com/apache/flink/blob/26aa543b3bbe2b606bbc6d332a2ef7c5b46d25eb/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/util/HadoopUtils.java#L59]
 is instantiated. This implicitly means one MUST have HDFS jars on classpath to 
run Flink securely. I'm constantly thinking about this to refactor but this is 
definitely a breaking change ( though I would support full rewrite of hadoop 
config loading since Flink has at least 6-7 different Hadoop config loading 
implementations which makes this area hell :) ).

Considering the actual situation we can decrease the error level to warning 
since the workload is going forward.
If you ask me then now/later on I would do the following for the clean solution:
 * Remove `HdfsConfiguration` from Flink Hadoop config loading since HDFS 
configs are not needed for Kerberos authentication.
 * Make the workload finally fail if module was not able to be loaded/installed 
(hadoop-common is on classpath so the user has intention to install the module)

I know that my clean solution would be a drastic change but that would be clear 
to the users.


> HadoopModuleFactory creates error if the security module cannot be loaded
> -------------------------------------------------------------------------
>
>                 Key: FLINK-30133
>                 URL: https://issues.apache.org/jira/browse/FLINK-30133
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Hadoop Compatibility
>    Affects Versions: 1.16.0, 1.17.0, 1.15.2
>            Reporter: Matthias Pohl
>            Priority: Minor
>              Labels: starter
>
> [HadoopModuleFactory|https://github.com/apache/flink/blob/26aa543b3bbe2b606bbc6d332a2ef7c5b46d25eb/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModuleFactory.java#L51]
>  tries to load the {{{}HadoopModule{}}}. If it fails to load the module, it 
> will log an error an return {{null}} which is going to be handled properly. 
> The resulting error log is, therefore, confusing. We might want to lower the 
> log level to warning since the error doesn't affect the Flink cluster in a 
> fatal way.
> We might want to make the cluster fail fatally if we consider this a sever 
> usability problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to