[jira] [Commented] (FLINK-15378) StreamFileSystemSink supported mutil hdfs plugins.

Piotr Nowojski (Jira) Thu, 02 Jan 2020 01:33:57 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-15378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006692#comment-17006692
 ]


Piotr Nowojski commented on FLINK-15378:
----------------------------------------

Thanks for the explanations, I think I know understand the issue.

For one thing, the current approach in the [proposed 
PR|https://github.com/apache/flink/pull/10686/] is not generic enough. It 
limits the support for different configurations to just {{StreamingFileSink}}. 
If we allow to identify plugins by parts from the URI (for example {{host}} or 
{{port}} as suggested by [~fly_in_gis] ), that would be better.

However I see couple of issues/follow up thoughts. 

For example, we would probably need some config file, that would say, that if 
you are using {{hdfs}} to talk to {{namenode1}} you must use {{conf A}}, while 
if you are writing to {{namenode2}} you should use {{conf B}}. I'm not sure how 
to express this. Just copying pasting whole fat jar two different plugins 
directories, with two different configs is one option, but...

I don't think changes in configuration, like different {{hdfs-site.xml}}, 
should enforce creation of another fat-jar, for the same reason as:
{quote}
They share the same schema "hdfs" and it will be not convenient and confusing 
for users if we changes the schema. 
{quote}
I agree both sinks writing to {{namenode1}} with {{conf A}} and to 
{{namenode2}} with {{conf B}} should be using the same schema, but they should 
also be using same plugin. 

I have to think a bit about this. Maybe we should decouple concept of plugin 
from a concept of the filesystem - one plugin could be used by different file 
system instances.

> StreamFileSystemSink supported mutil hdfs plugins.
> --------------------------------------------------
>
>                 Key: FLINK-15378
>                 URL: https://issues.apache.org/jira/browse/FLINK-15378
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem, FileSystems
>    Affects Versions: 1.9.2, 1.10.0
>            Reporter: ouyangwulin
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.11.0
>
>         Attachments: jobmananger.log
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> [As report from 
> maillist|[https://lists.apache.org/thread.html/7a6b1e341bde0ef632a82f8d46c9c93da358244b6bac0d8d544d11cb%40%3Cuser.flink.apache.org%3E]]
> Request 1:  FileSystem plugins not effect the default yarn dependecies.
> Request 2:  StreamFileSystemSink supported mutil hdfs plugins under the same 
> schema
> As Problem describe :
>     when I put a ' filesystem plugin to FLINK_HOME/pulgins in flink', and the 
> clas{color:#172b4d}s '*com.filesystem.plugin.FileSystemFactoryEnhance*' 
> implements '*FileSystemFactory*', when jm start, It will call 
> FileSystem.initialize(configuration, 
> PluginUtils.createPluginManagerFromRootFolder(configuration)) to load 
> factories to map  FileSystem#**{color}FS_FACTORIES, and the key is only 
> schema. When tm/jm use local hadoop conf A ,   the user code use hadoop conf 
> Bin 'filesystem plugin',  Conf A and Conf B is used to different hadoop 
> cluster. and The Jm will start failed, beacuse of the blodserver in JM will 
> load Conf B to get filesystem. the full log add appendix.
>  
> AS reslove method:
>     use  schema and spec identify as key for ' FileSystem#**FS_FACTORIES '
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-15378) StreamFileSystemSink supported mutil hdfs plugins.

Reply via email to