[
https://issues.apache.org/jira/browse/HADOOP-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499470#comment-13499470
]
Yanbo Liang commented on HADOOP-9041:
-------------------------------------
Hi Alejandro,
My previous comment may be not very clear. The detail calling stack is
described as follow:
If users register org.apache.hadoop.fs.FsUrlStreamHandlerFactory as the current
URLStreamHandlerFactory before calling
FileSystem.getFileSystem()->FileSystem.loadFileSystems() will lead infinite
loop.
1) org.apache.hadoop.fs.FsUrlStreamHandlerFactory has been registered as the
current URLStreamHandlerFactory.
2) users call FileSystem.getFileSystem()->ClassFileSystem.loadFileSystems().
3) Because before 2) users have never called FileSystem.loadFileSystems(), so
it will execute the code of fuction FileSystem.loadFileSystems().
4) In FileSystem.loadFileSystems(), it uses ServiceLoader to load providers of
FileSystem such as hdfs, kfs, s3 and etc.
5) When execute ServiceLoader, it need to read the providers of FileSystem from
resource directory such as jar file on local disk. The ServiceLoader will
recognize the jar file as URL.
6) ServiceLoader create URL object and open stream to this URL.
7) The URL need to find handler for a specific protocol such as "file:///" then
it will call URL.getURLStreamHandler() and indirectly call
FsUrlStreamHandlerFactory.createURLStreamHandler().
8) At the function of FsUrlStreamHandlerFactory.createURLStreamHandler(), it
need to recognize different file system schemes or protocols according to the
providers of FileSystem (If the jar file is on local disk, it need to know the
implementaion of LocalFileSystem). But at this time the providers of FileSystem
had not loaded in memory, it will call
FileSystem.getFileSystem("file",conf)->FileSystem.loadFileSystems(). We jump to
step 2) and drop into infinite loop.
Because the URL is closely relevent with concrete FileSystem implementations,
we need to load FileSystem implemetations before any URL related operations. I
mean to call FileSystem.getFileSystemClass("file",conf) in the construction of
class FsUrlStreamHandlerFactory to solve this problem, because
FsUrlStreamHandlerFactory need ensure to know the FileSystem implementation of
scheme "file:///" at least and then it can work regularly.
The patch had been attached. Looking forward to your comments.
> FileSystem initialization can go into infinite loop
> ---------------------------------------------------
>
> Key: HADOOP-9041
> URL: https://issues.apache.org/jira/browse/HADOOP-9041
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 2.0.2-alpha
> Reporter: Radim Kolar
> Assignee: Yanbo Liang
> Priority: Critical
> Attachments: fstest.groovy, HADOOP-9041.patch, HADOOP-9041.patch,
> HADOOP-9041.patch
>
>
> More information is there: https://jira.springsource.org/browse/SHDP-111
> Referenced source code from example is:
> https://github.com/SpringSource/spring-hadoop/blob/master/src/main/java/org/springframework/data/hadoop/configuration/ConfigurationFactoryBean.java
> from isolating that cause it looks like if you register:
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory before calling
> FileSystem.loadFileSystems() then it goes into infinite loop.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira