[
https://issues.apache.org/jira/browse/HADOOP-4635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648868#action_12648868
]
Pete Wyckoff commented on HADOOP-4635:
--------------------------------------
My 17.1 cluster, I don't see mine leaking, but I do see that because of the
Java GC, the memory use is hard to peg. I mounted with -ordbuffer=67108864 and
then ran 1000 head part-00000 > /dev/null for a file and the memory sometimes
climbs as high as a 1 GB, but then comes down and stays at about 550MB.
I also tried the following code, and I saw about the same behavior as with
fuse. I also tried using a 64MB buffer in fuse, but short circuiting the code
to do such big reads to dfs and the memory never grew much.
It may just be Java. Maybe you could set LIBHDFS_OPTS to the options to make
the GC write a log but I don't know if it will show how much is in use by the
JVM ??
{code}
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.dfs.*;
public class test {
public static void main(String args[]) {
try {
int size = 256 * 1024 * 1024;
byte buf[] = new byte[size];
FileSystem fs = FileSystem.get(new Configuration());
for (int i = 0; i < 1000; i++) {
FSDataInputStream fsi = fs.open(new Path("/some/path/to/part-00000"));
fsi.readFully( 0,buf);
System.err.println(i );
fsi.close();
}
Thread.sleep(60*1000);
} catch(Exception e) {
e.printStackTrace();
}
}
};
{code}
> Memory leak ?
> -------------
>
> Key: HADOOP-4635
> URL: https://issues.apache.org/jira/browse/HADOOP-4635
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/fuse-dfs
> Affects Versions: 0.19.0, 0.20.0
> Reporter: Marc-Olivier Fleury
>
> I am running a process that needs to crawl a tree structure containing ~10K
> images, copy the images to the local disk, process these images, and copy
> them back to HDFS.
> My problem is the following : after about 10h of processing, the processes
> crash, complaining about a std::bad_alloc exception (I use hadoop pipes to
> run existing software). When running fuse_dfs in debug mode, I get an
> outOfMemoryError, telling that there is no more room in the heap.
> While the process is running, using top or ps, I notice that fuse is using up
> an increasing amount of memory, until some limit is reached. At that point ,
> the memory used is oscillating. I suppose that this is due to the use of the
> virtual memory.
> This leads me to the conclusion that there is some memory leak in fuse_dfs,
> since the only other programs running are Hadoop and the existing software,
> both thoroughly tested in the past.
> My problem is that my knowledge concerning memory leak tracking is rather
> limited, so I will need some instructions to get more insight concerning this
> issue.
> Thank you
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.