[ 
https://issues.apache.org/jira/browse/HADOOP-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888509#comment-15888509
 ] 

Jason Lowe commented on HADOOP-14132:
-------------------------------------

+1.  We load a _lot_ of classes as it is just to do anything, and the 
filesystem service loader is a large part of that.

Unless the service loader metadata is considered part of our API, I think we 
can migrate existing filesystems over to this as well.  The filesystem load can 
do a two-part scan, the first part loading the old-style FileSystem-derived 
class directly, and the second part does a service loader for the new, 
lightweight filesystem descriptor classes.  Once the new system is in place, I 
think we can migrate the old filesystems over to it by removing their old 
service loader metadata entries and provide new ones for the lightweight 
descriptor.  Then the old filesystems will still load, but without requiring 
all of their dependencies to load during the service loader scan.  The only 
gotcha I can think of is whether anyone is depending upon the existing service 
loader metadata in some way, since that would be removed for the legacy 
filesystems as they migrate to the new scheme.

> Filesystem discovery to stop loading implementation classes
> -----------------------------------------------------------
>
>                 Key: HADOOP-14132
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14132
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs, fs/adl, fs/azure, fs/oss, fs/s3, fs/swift
>    Affects Versions: 2.7.3
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> Integration testing of Hadoop with the HADOOP-14040 has shown up that the 
> move to a shaded AWS JAR is slowing all hadoop client code down.
> I believe this is due to how we use service discovery to identify FS 
> implementations: the implementation classes themselves are instantiated.
> This has known problems today with classloading, but clearly impacts 
> performance too, especially with complex transitive dependencies unique to 
> the loaded class.
> Proposed: have lightweight service declaration classes which implement an 
> interface declaring
> # schema
> # classname of FileSystem impl
> # classname of AbstractFS impl
> # homepage (for third party code, support, etc)
> These are what we register and scan in the FS to look for services.
> This will leave the question about what to do for existing filesystems? I 
> think we'll need to retain the old code for external ones, while moving the 
> hadoop modules to the new ones



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to