here is the more detailed issue about this problem
while trying to call FsShell methods programatically:
[ i have not tried FileSystem interfaces as Raghu refered before ]
Descriptions:
I am try to "create dirs" [same while upload or other Operation] via
calling FsShell's interface
with mass of files, such as 1000000.
after run for some Time , Exception throwed.
here's the exception info:
------------------------------------------------------------------------
Exception in thread "main" java.lang.OutOfMemoryError: unable to create new
native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Unknown Source)
at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:116)
at
org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.initialize(DistributedFileSystem.java:67)
at
org.apache.hadoop.fs.FilterFileSystem.initialize(FilterFileSystem.java:57)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:160)
at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
at org.apache.hadoop.fs.FsShell.init(FsShell.java:41)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:809)
at kingsoft.lab.duba.CustomInterface.CreateDir(CustomInterface.java:138)
at kingsoft.lab.duba.CustomInterface.main(CustomInterface.java:155)
------------------------------------------------------------------------
Then i check the task monitor , and found there are several thounds of
threads of the javaw.exe [via eclipse3.2, java1.6] running this Process
threads are created without any one be terminated until expectioned and the
Process killed.
here's the snapshot i caught:
--------------------------------------------------------------------------------
imageName| cpu | MemUsage | Peak MemUsage|threads |i/o reads| i/o reads
bytes
--------------------------------------------------------------------------------
javaw.exe | 31 | 147,140k | 147,140k |5,059 |1,061 |
3,372,459
--------------------------------------------------------------------------------
here's my test code calling FsShell.run();
--------------------------------------------------------------------------------
import java.io.File;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FsShell;
import org.apache.hadoop.fs.Path;
public class CustomInterface
{
Configuration conf ;
public CustomInterface()
{
conf = new Configuration();
}
public int createDir(String dirName)
{
int iRet = 0;
try
{
String[] strCmd = new String[2];
strCmd[0] = "-mkdir";
strCmd[1] = dirName;
// ever i define shell as member of this class
// but problem is all the same
FsShell shell = new FsShell();
shell.setConf(conf);
int ret = shell.run(strCmd);
}
catch(Exception e)
{
}
return iRet;
}
public static void main(String[] args)
{
CustomInterface ci = new CustomInterface();
for (int i = 0; i <= 1000000; i ++)
{
// actually each time will create a thread
// and till i to be about 20000 , throws exception
// as i set -Xmx512m to hold so much threads
ci.createDir("mercury_test/java");
System.out.println("create mercury_test/java ok");
}
}
}
---------------------------------------------------------------------------
Dhruba Borthakur wrote:
>
> Please try this attached patch, let me know if it works.
>
> Thanks,
> dhruba
>
> -----Original Message-----
> From: KrzyCube [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, July 24, 2007 6:19 PM
> To: [email protected]
> Subject: Re: Calling FsShell.doMain() hold so many threads
>
>
> first of all ,thanks , Raghu.
>
> here's the exception info:
> ------------------------------------------------------------------------
> Exception in thread "main" java.lang.OutOfMemoryError: unable to create
> new
> native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Unknown Source)
> at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:116)
> at
> org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.initial
> ize(DistributedFileSystem.java:67)
> at
> org.apache.hadoop.fs.FilterFileSystem.initialize(FilterFileSystem.java:57)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:160)
> at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
> at org.apache.hadoop.fs.FsShell.init(FsShell.java:41)
> at org.apache.hadoop.fs.FsShell.run(FsShell.java:809)
> at kingsoft.lab.duba.CustomInterface.CreateDir(CustomInterface.java:138)
> at kingsoft.lab.duba.CustomInterface.main(CustomInterface.java:155)
> ------------------------------------------------------------------------
>
> Then , is there any recommendable API for these use ?
> here "these" I mean: upload or download files and create dir
> programmatically even in concurrency operation.
>
>
> Raghu Angadi wrote:
>>
>>
>> Can you get the stack trace of the threads that are left? It was not
>> obvious from the code where a thread is started. It might be 'trash
>> handler'.
>>
>> You could add sleep(10sec) to give you enough time to get the trace.
>>
>> FsShell might not be designed for this use, but seems like a pretty
>> useful feature.
>>
>> Raghu.
>>
>> KrzyCube wrote:
>>> I have tried the way TestDFSShell.java does,
>>> here's my code:
>>>
>>> ------------------------------------------------------------
>>> public class CustomInterface
>>> {
>>> Configuration conf ;
>>> FsShell fs ;
>>>
>>> public CustomInterface()
>>> {
>>> conf = new Configuration();
>>> fs = new FsShell();
>>>
>>> fs.setConf(conf);
>>> }
>>>
>>> public int createDir(String strDirName,String strPath)
>>> {
>>> // omit exception catch
>>> int iRet = 0;
>>> strPath += strDirName;
>>> String[] strCmd = new String[2];
>>> strCmd[0] = "-mkdir";
>>> strCmd[1] = strPath;
>>> return m_fs.run(strCmd);
>>> }
>>> }
>>> ------------------------------------------------------------
>>>
>>> Then i just call the createdir Method
>>>
>>> for(int i =0 ; i < 100000 ; i ++)
>>> {
>>> custom.createDir("someName");
>>> }
>>>
>>> this cause the java vm process hold many threads
>>> and these threads eat memory .
>>> till the JVM Heap are eat up , throws Exceptions.
>>> only larger Heap size holds more threads , but not fix the problem.
>>>
>>> thanks.
>>>
>>>
>>> Dhruba Borthakur wrote:
>>>> One example of programmatically using FsShell is in
>>>> src/test/org/apache/hadoop/dfs/TestDFSShell.java
>>>>
>>>> Thanks,
>>>> dhruba
>>>>
>>>> -----Original Message-----
>>>> From: KrzyCube [mailto:[EMAIL PROTECTED]
>>>> Sent: Monday, July 23, 2007 7:49 PM
>>>> To: [email protected]
>>>> Subject: Calling FsShell.doMain() hold so many threads
>>>>
>>>>
>>>> Hi there:
>>>>
>>>> i got two questions:
>>>>
>>>> Q1:
>>>> I am try to call the FsShell.doMain() with my own code , which is
>>>> only
>>>> a easy wrapper of the FsShell.
>>>> But when i am trying to create many dirs , 10000 etc. Exception like
>>>> "Not
>>>> enough memory for more threads" throw , i have set the -Xmx512m.
>>>> Then i trying to view the process info while the program running ,
>>>> then
>>>> i found there are more and more threads invoked during the process ,
>>>> and
>>>> eat
>>>> more and more memory ,all threads still there without exit.
>>>> Then i came to the source code , and found that while the
>>>> FsShell.Main()
>>>> for terminal call there is one line
>>>> "System.exit(return_value_of_doMain)"
>>>> ,
>>>> Is that mean the call of the ToolBase.run() which implemented in
>>>> FsShell.java is always create a new thread and have to be force
>>>> terminated
>>>> by System.exit() to kill the process ?
>>>> So , if that is , how can i write my own code to use hadoop with
>>>> FsShell
>>>> in multi-thread mode , or is there any other way to do this ?
>>>>
>>>> Q2:
>>>> I svn code , and run it in eclipse [the only reason i refer to
>>>> eclipse
>>>> is to indicate my environment],
>>>> under Unbuntu 7.04.
>>>> all about casual , i want to see how much time the
>>>> FsShell.doMain()
>>>> take , I use "new Date()" and
>>>> get the interval with "DateEnd.getTime() - DateBeg.getTime()"
>>>> Then i found that: even mkdir take more then 1000 [getTime shows]
>>>> if there's no arguments , it take 25 , but even if i just give it a
>>>> wrong
>>>> argument , such as "-sl", it take more than 1000 , is that means the
>>>> argument check take most of the time cost?
>>>>
>>>> --
>>>> View this message in context:
>>>>
> http://www.nabble.com/Calling-FsShell.doMain%28%29-hold-so-many-threads-tf41
>>>> 33557.html#a11756139
>>>> Sent from the Hadoop Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
> --
> View this message in context:
> http://www.nabble.com/Calling-FsShell.doMain%28%29-hold-so-many-threads-tf41
> 33557.html#a11774684
> Sent from the Hadoop Users mailing list archive at Nabble.com.
>
>
>
--
View this message in context:
http://www.nabble.com/Calling-FsShell.doMain%28%29-hold-so-many-threads-tf4133557.html#a11779568
Sent from the Hadoop Users mailing list archive at Nabble.com.