here is the more detailed issue about this problem 
while trying to call FsShell methods programatically:
[ i have not tried FileSystem interfaces as Raghu refered before ]

Descriptions:
        I am try to "create dirs" [same while upload or other Operation]  via
calling FsShell's interface
with mass of files, such as 1000000.
        after run for some Time , Exception throwed.

here's the exception info:
------------------------------------------------------------------------
Exception in thread "main" java.lang.OutOfMemoryError: unable to create new
native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Unknown Source)
at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:116)
at
org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.initialize(DistributedFileSystem.java:67)
at
org.apache.hadoop.fs.FilterFileSystem.initialize(FilterFileSystem.java:57)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:160)
at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
at org.apache.hadoop.fs.FsShell.init(FsShell.java:41)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:809)
at kingsoft.lab.duba.CustomInterface.CreateDir(CustomInterface.java:138)
at kingsoft.lab.duba.CustomInterface.main(CustomInterface.java:155)
------------------------------------------------------------------------        

Then i check the task monitor , and found there are several thounds of
threads of the javaw.exe [via eclipse3.2, java1.6] running this Process
threads are created without any one be terminated until expectioned and the
Process killed.
here's the snapshot i caught:
--------------------------------------------------------------------------------
imageName| cpu | MemUsage | Peak MemUsage|threads |i/o reads| i/o reads
bytes
--------------------------------------------------------------------------------
javaw.exe    | 31   | 147,140k    | 147,140k           |5,059    |1,061    |
3,372,459
--------------------------------------------------------------------------------
        

here's my test code calling FsShell.run();
--------------------------------------------------------------------------------
import java.io.File;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FsShell;
import org.apache.hadoop.fs.Path;

public class CustomInterface 
{       
        Configuration conf ;
        
        public CustomInterface()
        {
                conf = new Configuration();
        }
        public int createDir(String dirName)
        {
                int iRet = 0;
                try
                {                       
                        String[] strCmd = new String[2];
                        strCmd[0] = "-mkdir";
                        strCmd[1] = dirName;
                        
                        // ever i define shell as member of this class
                        // but problem is all the same
                        FsShell shell = new FsShell();
                        shell.setConf(conf);
                        int ret = shell.run(strCmd);            
                }
                catch(Exception e)
                {
                }       
                return iRet;
        }
        public static void main(String[] args)
        {
                CustomInterface ci = new CustomInterface();
                
                for (int i = 0; i <= 1000000; i ++)
                {
                        // actually each time will create a thread 
                        // and till i to be about 20000 , throws exception
                        // as i set -Xmx512m to hold so much threads
                        ci.createDir("mercury_test/java");
                        System.out.println("create mercury_test/java ok");
                }
        }
}
---------------------------------------------------------------------------


Dhruba Borthakur wrote:
> 
> Please try this attached patch, let me know if it works.
> 
> Thanks,
> dhruba
> 
> -----Original Message-----
> From: KrzyCube [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, July 24, 2007 6:19 PM
> To: [email protected]
> Subject: Re: Calling FsShell.doMain() hold so many threads
> 
> 
> first of all ,thanks , Raghu.
> 
> here's the exception info:
> ------------------------------------------------------------------------
> Exception in thread "main" java.lang.OutOfMemoryError: unable to create
> new
> native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Unknown Source)
> at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:116)
> at
> org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.initial
> ize(DistributedFileSystem.java:67)
> at
> org.apache.hadoop.fs.FilterFileSystem.initialize(FilterFileSystem.java:57)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:160)
> at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
> at org.apache.hadoop.fs.FsShell.init(FsShell.java:41)
> at org.apache.hadoop.fs.FsShell.run(FsShell.java:809)
> at kingsoft.lab.duba.CustomInterface.CreateDir(CustomInterface.java:138)
> at kingsoft.lab.duba.CustomInterface.main(CustomInterface.java:155)
> ------------------------------------------------------------------------
> 
> Then , is there any recommendable API for these use ?
> here "these" I mean: upload or download files and create dir
> programmatically even in concurrency operation.
> 
> 
> Raghu Angadi wrote:
>> 
>> 
>> Can you get the stack trace of the threads that are left? It was not 
>> obvious from the code where a thread is started. It might be 'trash 
>> handler'.
>> 
>> You could add sleep(10sec) to give you enough time to get the trace.
>> 
>> FsShell might not be designed for this use, but seems like a pretty 
>> useful feature.
>> 
>> Raghu.
>> 
>> KrzyCube wrote:
>>> I have tried the way TestDFSShell.java does,
>>> here's my code:
>>> 
>>> ------------------------------------------------------------
>>> public class CustomInterface 
>>> {   
>>>     Configuration conf ;
>>>     FsShell fs ;
>>>     
>>>     public CustomInterface()
>>>     {
>>>             conf = new Configuration();
>>>             fs = new FsShell();
>>>             
>>>             fs.setConf(conf);
>>>     }
>>> 
>>>         public int createDir(String strDirName,String strPath)
>>>     {
>>>                 // omit exception catch
>>>             int iRet = 0;
>>>             strPath += strDirName;
>>>             String[] strCmd = new String[2];
>>>             strCmd[0] = "-mkdir";
>>>             strCmd[1] = strPath;            
>>>             return m_fs.run(strCmd);                
>>>     }       
>>> }
>>> ------------------------------------------------------------
>>> 
>>> Then i just call the createdir Method
>>> 
>>> for(int i =0 ; i < 100000 ; i ++)
>>> {
>>>     custom.createDir("someName");
>>> }
>>> 
>>> this cause the java vm process hold many threads
>>> and these threads eat memory .
>>> till the JVM Heap are eat up , throws Exceptions.
>>> only larger Heap size holds more threads , but not fix the problem.
>>> 
>>> thanks.
>>> 
>>> 
>>> Dhruba Borthakur wrote:
>>>> One example of programmatically using FsShell is in
>>>> src/test/org/apache/hadoop/dfs/TestDFSShell.java
>>>>
>>>> Thanks,
>>>> dhruba
>>>>
>>>> -----Original Message-----
>>>> From: KrzyCube [mailto:[EMAIL PROTECTED] 
>>>> Sent: Monday, July 23, 2007 7:49 PM
>>>> To: [email protected]
>>>> Subject: Calling FsShell.doMain() hold so many threads
>>>>
>>>>
>>>> Hi there:
>>>>
>>>> i got two questions:
>>>>
>>>> Q1:
>>>>     I am try to  call the FsShell.doMain() with my own code , which is
>>>> only
>>>> a easy wrapper of the FsShell.
>>>> But when i am trying to create many dirs , 10000 etc. Exception like
>>>> "Not
>>>> enough memory for more threads" throw ,  i have set the -Xmx512m.
>>>>     Then i trying to view the process info while the program running ,
>>>> then
>>>> i found there are more and more threads invoked during the process ,
>>>> and
>>>> eat
>>>> more and more memory ,all threads still there without exit.
>>>>     Then i came to the source code , and found that while the
>>>> FsShell.Main()
>>>> for terminal call there is one line
>>>> "System.exit(return_value_of_doMain)"
>>>> ,
>>>> Is that mean the call of the ToolBase.run() which implemented in
>>>> FsShell.java is always create a new thread and have to be force
>>>> terminated
>>>> by System.exit() to kill the process ?
>>>>     So , if that is , how can i write my own code to use hadoop with
>>>> FsShell
>>>> in multi-thread mode , or is there any other way to do this ?
>>>>
>>>> Q2:
>>>>      I svn code  , and run it in eclipse [the only reason i refer to
>>>> eclipse
>>>> is to indicate my environment],
>>>> under Unbuntu 7.04.
>>>>      all about casual , i want to see how much time the
>>>> FsShell.doMain()
>>>> take , I use "new Date()" and 
>>>> get the interval with "DateEnd.getTime() - DateBeg.getTime()"
>>>>      Then i found that: even mkdir take more then 1000 [getTime shows]
>>>> if there's no arguments , it take 25 , but even if i just give it a
>>>> wrong
>>>> argument , such as "-sl", it take more than 1000 , is that means the
>>>> argument check take most of the time cost?
>>>>
>>>> -- 
>>>> View this message in context:
>>>>
> http://www.nabble.com/Calling-FsShell.doMain%28%29-hold-so-many-threads-tf41
>>>> 33557.html#a11756139
>>>> Sent from the Hadoop Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>>
>>>>
>>> 
>> 
>> 
>> 
> 
> -- 
> View this message in context:
> http://www.nabble.com/Calling-FsShell.doMain%28%29-hold-so-many-threads-tf41
> 33557.html#a11774684
> Sent from the Hadoop Users mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Calling-FsShell.doMain%28%29-hold-so-many-threads-tf4133557.html#a11779568
Sent from the Hadoop Users mailing list archive at Nabble.com.

Reply via email to