[ 
https://issues.apache.org/jira/browse/HADOOP-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499470#comment-13499470
 ] 

Yanbo Liang commented on HADOOP-9041:
-------------------------------------

Hi Alejandro,
My previous comment may be not very clear. The detail calling stack is 
described as follow:
If users register org.apache.hadoop.fs.FsUrlStreamHandlerFactory as the current 
URLStreamHandlerFactory before calling 
FileSystem.getFileSystem()->FileSystem.loadFileSystems() will lead infinite 
loop.
1) org.apache.hadoop.fs.FsUrlStreamHandlerFactory has been registered as the 
current URLStreamHandlerFactory.
2) users call FileSystem.getFileSystem()->ClassFileSystem.loadFileSystems().
3) Because before 2) users have never called  FileSystem.loadFileSystems(), so 
it will execute the code of fuction FileSystem.loadFileSystems().
4) In FileSystem.loadFileSystems(), it uses ServiceLoader to load providers of 
FileSystem such as hdfs, kfs, s3 and etc.
5) When execute ServiceLoader, it need to read the providers of FileSystem from 
resource directory such as jar file on local disk. The ServiceLoader will 
recognize the jar file as URL.
6) ServiceLoader create URL object and open stream to this URL.
7) The URL need to find handler for a specific protocol such as "file:///" then 
it will call URL.getURLStreamHandler() and indirectly call 
FsUrlStreamHandlerFactory.createURLStreamHandler().
8) At the function of FsUrlStreamHandlerFactory.createURLStreamHandler(), it 
need to recognize different file system schemes or protocols according to the 
providers of FileSystem (If the jar file is on local disk, it need to know the 
implementaion of LocalFileSystem). But at this time the providers of FileSystem 
had not loaded in memory, it will call 
FileSystem.getFileSystem("file",conf)->FileSystem.loadFileSystems(). We jump to 
step 2) and drop into infinite loop.

Because the URL is closely relevent with concrete FileSystem implementations, 
we need to load FileSystem implemetations before any URL related operations. I 
mean to call FileSystem.getFileSystemClass("file",conf) in the construction of 
class FsUrlStreamHandlerFactory to solve this problem, because 
FsUrlStreamHandlerFactory need ensure to know the FileSystem implementation of 
scheme "file:///" at least and then it can work regularly. 

The patch had been attached. Looking forward to your comments.
                
> FileSystem initialization can go into infinite loop
> ---------------------------------------------------
>
>                 Key: HADOOP-9041
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9041
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.0.2-alpha
>            Reporter: Radim Kolar
>            Assignee: Yanbo Liang
>            Priority: Critical
>         Attachments: fstest.groovy, HADOOP-9041.patch, HADOOP-9041.patch, 
> HADOOP-9041.patch
>
>
> More information is there: https://jira.springsource.org/browse/SHDP-111
> Referenced source code from example is: 
> https://github.com/SpringSource/spring-hadoop/blob/master/src/main/java/org/springframework/data/hadoop/configuration/ConfigurationFactoryBean.java
> from isolating that cause it looks like if you register: 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory before calling 
> FileSystem.loadFileSystems() then it goes into infinite loop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to