You have to use FileSystem interface just like FsShell does. See FsShell.mkdir() for e.g. . You need to handle error cases, exceptions etc. If it is just one or two things you want to do, then it won't be too bad.

Could you file a jira with issue you saw with the threads (with the example problem you wrote and the stack trace below).

thanks,
Raghu.

KrzyCube wrote:
first of all ,thanks , Raghu.

here's the exception info:
------------------------------------------------------------------------
Exception in thread "main" java.lang.OutOfMemoryError: unable to create new
native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Unknown Source)
at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:116)
at
org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.initialize(DistributedFileSystem.java:67)
at
org.apache.hadoop.fs.FilterFileSystem.initialize(FilterFileSystem.java:57)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:160)
at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
at org.apache.hadoop.fs.FsShell.init(FsShell.java:41)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:809)
at kingsoft.lab.duba.CustomInterface.CreateDir(CustomInterface.java:138)
at kingsoft.lab.duba.CustomInterface.main(CustomInterface.java:155)
------------------------------------------------------------------------

Then , is there any recommendable API for these use ?
here "these" I mean: upload or download files and create dir
programmatically even in concurrency operation.


Raghu Angadi wrote:

Can you get the stack trace of the threads that are left? It was not obvious from the code where a thread is started. It might be 'trash handler'.

You could add sleep(10sec) to give you enough time to get the trace.

FsShell might not be designed for this use, but seems like a pretty useful feature.

Raghu.

KrzyCube wrote:
I have tried the way TestDFSShell.java does,
here's my code:

------------------------------------------------------------
public class CustomInterface {
        Configuration conf ;
        FsShell fs ;
        
        public CustomInterface()
        {
                conf = new Configuration();
                fs = new FsShell();
                
                fs.setConf(conf);
        }

        public int createDir(String strDirName,String strPath)
        {
                // omit exception catch
                int iRet = 0;
                strPath += strDirName;
                String[] strCmd = new String[2];
                strCmd[0] = "-mkdir";
                strCmd[1] = strPath;            
                return m_fs.run(strCmd);                
        }       
}
------------------------------------------------------------

Then i just call the createdir Method

for(int i =0 ; i < 100000 ; i ++)
{
    custom.createDir("someName");
}

this cause the java vm process hold many threads
and these threads eat memory .
till the JVM Heap are eat up , throws Exceptions.
only larger Heap size holds more threads , but not fix the problem.

thanks.


Dhruba Borthakur wrote:
One example of programmatically using FsShell is in
src/test/org/apache/hadoop/dfs/TestDFSShell.java

Thanks,
dhruba

-----Original Message-----
From: KrzyCube [mailto:[EMAIL PROTECTED] Sent: Monday, July 23, 2007 7:49 PM
To: [email protected]
Subject: Calling FsShell.doMain() hold so many threads


Hi there:

i got two questions:

Q1:
    I am try to  call the FsShell.doMain() with my own code , which is
only
a easy wrapper of the FsShell.
But when i am trying to create many dirs , 10000 etc. Exception like
"Not
enough memory for more threads" throw ,  i have set the -Xmx512m.
    Then i trying to view the process info while the program running ,
then
i found there are more and more threads invoked during the process , and
eat
more and more memory ,all threads still there without exit.
    Then i came to the source code , and found that while the
FsShell.Main()
for terminal call there is one line
"System.exit(return_value_of_doMain)"
,
Is that mean the call of the ToolBase.run() which implemented in
FsShell.java is always create a new thread and have to be force
terminated
by System.exit() to kill the process ?
    So , if that is , how can i write my own code to use hadoop with
FsShell
in multi-thread mode , or is there any other way to do this ?

Q2:
     I svn code  , and run it in eclipse [the only reason i refer to
eclipse
is to indicate my environment],
under Unbuntu 7.04.
     all about casual , i want to see how much time the FsShell.doMain()
take , I use "new Date()" and get the interval with "DateEnd.getTime() - DateBeg.getTime()"
     Then i found that: even mkdir take more then 1000 [getTime shows]
if there's no arguments , it take 25 , but even if i just give it a
wrong
argument , such as "-sl", it take more than 1000 , is that means the
argument check take most of the time cost?

--
View this message in context:
http://www.nabble.com/Calling-FsShell.doMain%28%29-hold-so-many-threads-tf41
33557.html#a11756139
Sent from the Hadoop Users mailing list archive at Nabble.com.








Reply via email to