[ 
https://issues.apache.org/jira/browse/HADOOP-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960141#comment-15960141
 ] 

Siddharth Seth commented on HADOOP-14138:
-----------------------------------------

[~steve_l] - I understand the mechanics behind *-default.xml and *-site.xml. 
When I said "If someone wants to use s3a, I'd expect them to explicitly set it 
up in their Configuration," - their own Configuration could well be 
core-site.xml, which will then be loaded by all Hadoop services.

What I'm asking is why s3a gets special treatment, and an entry in 
core-default.xml.  Along with that, the 5+ additional s3a settings - why do 
they need to be defined in core-default.xml? Should be possible to have the 
default values in code. This could be a separate template, which users can 
include, to get all relevant settings (if custom settings are required). 
Without custom settings, the service loader approach is sufficient to get s3a 
functional, as long as the jar is available.

Hdfs does not have an entry in core-default, and relies upon the ServiceLoader 
approach. (fs.hdfs.impl does not exist. fs.AbstractFileSystem.hdfs.impl exists 
- I don't know what this is used for). 

core-default.xml, to me at least, serves more as documentation of defaults. The 
files can go out of sync with the default values defined in code, 
YarnConfiguration for example. It takes additional effort to keep the files in 
sync. There's jiras to remove all the *-default.xml files, in favor of code 
defaults (I don't expect these to be fixed soon since such changes would be 
incompatible). For most parameters in these files, the code has default values 
(all the IPC defaults).
I suspect nothing has broken so far, because the defaults exist in code.

In terms of the s3a and service loader problems, HADOOP-14132 sounds like a 
very good fix to have. If I'm understanding this correctly, general FS 
operations will be faster if we don't load all filesystems in the clsaspath. 
I'm worried that we're introducing a new dependency on core-default by making 
this change, while I think we should be going in the opposite direction and 
getting rid of dependencies on these files.



> Remove S3A ref from META-INF service discovery, rely on existing core-default 
> entry
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-14138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14138
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Critical
>             Fix For: 2.8.0, 2.7.4, 3.0.0-alpha3
>
>         Attachments: HADOOP-14138.001.patch, HADOOP-14138-branch-2-001.patch
>
>
> As discussed in HADOOP-14132, the shaded AWS library is killing performance 
> starting all hadoop operations, due to classloading on FS service discovery.
> This is despite the fact that there is an entry for fs.s3a.impl in 
> core-default.xml, *we don't need service discovery here*
> Proposed:
> # cut the entry from 
> {{/hadoop-aws/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem}}
> # when HADOOP-14132 is in, move to that, including declaring an XML file 
> exclusively for s3a entries
> I want this one in first as its a major performance regression, and one we 
> coula actually backport to 2.7.x, just to improve load time slightly there too



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to