Kelvin. This is year 2010 and computer programs should not be that fragile. And I believe my code is just a fast simple toy problem trying to find out why I failed too many times in my real problem. Before I post my problem, I checked and searched many documents, I read through the API and there is no clear instruction telling me what should I do to prevent such an error. I don't have time to bug an API on purpose, I am doing NLP pos tagging and I have exactly 6 million stemmed word to store. Fortunately or unlucky to me, that number exactly triggers the failure so I had to spend 6 hours finding out the reason. Actually spy client is the first API I tried, as I pointed out in my first post, it is fast, however, there is an error. I don't think for a normal end-product API, the memory leak issue should be considered by the user.
Shi On Sun, Oct 17, 2010 at 1:11 AM, Kelvin Edmison <[email protected]> wrote: > Shi, > > Be careful when you start calling it a buggy API, especially as you > present the quality of code that you did in your initial test case. Your > bugs-per-LOC was pretty high. > > However, it seems that you did in fact stumble into a bug in the Spy client, > but only because you did no error checking at all. > > Dustin, > while trying to re-create this problem and point out the various errors in > his code, I found that, in his test case, if I did not call Future.get() to > verify the result of the set, the spyMemcached client leaked memory. Given > that the Spymemcached wiki says that fire-and-forget is a valid mode of > usage, this appears to be a bug. > > Here's my testcase against spymemcached-2.5.jar: > 'java -cp .:./memcached-2.5.jar FutureResultLeak true' leaks memory and will > eventually die OOM. > ' java -cp .:./memcached-2.5.jar FutureResultLeak false' does not leak and > runs to completion. > > Here's the code. It's based on Shi's testcase so he and I now share the > blame for code quality :) > > ---------------------- > import net.spy.memcached.*; > import java.lang.*; > import java.net.*; > import java.util.concurrent.*; > > public class FutureResultLeak { > > public static void main(String[] args) throws Exception { > boolean leakMemory = false; > if (args.length >= 1) { > leakMemory = Boolean.valueOf(args[0]); > } > System.out.println("Testcase will " + (leakMemory ? "leak memory" : "not > leak memory")); > MemcachedClient mc=new MemcachedClient(new > InetSocketAddress("localhost", 11211)); > mc.flush(); > System.out.println("Memcached flushed ..."); > int count = 0; > int logInterval = 100000; > int itemExpiryTime = 600; > long intervalStartTime = System.currentTimeMillis(); > for(int i=0;i<6000000;i++){ > String a = "String"+i; > String b = "Value"+i; > > > Future<Boolean> f =mc.add(a,itemExpiryTime, b); > if (!leakMemory) { > f.get(); > } > count++; > if (count % logInterval == 0) { > long elapsed = System.currentTimeMillis() - intervalStartTime; > double itemsPerSec = logInterval*1.0/elapsed; > System.out.println(count+ " elements added in " + elapsed + " (" + > itemsPerSec + " per sec)."); > intervalStartTime = System.currentTimeMillis(); > } > } > > System.out.println("done "+ count +" records inserted"); > mc.shutdown(60, TimeUnit.SECONDS); > } > } > ---------------------- > > > Regards, > Kelvin > > > > > On 17/10/10 12:28 AM, "Shi Yu" <[email protected]> wrote: > >> And I run with the following java command on a 64-bit Unix machine >> which has 8G memory. I separate the Map into three parts, still >> failed. TBH I think there is some bug in the spymemcached input >> method. With Whalin's API there is no any problem with only 2G heap >> size, just a little bit slower but thats definitely better than being >> stuck for 6 hours on a bugged API. >> >> java -Xms4G -Xmx4G -classpath ./lib/spymemcached-2.5.jar Memcaceload >> >> Here is the error output: >> >> 2010-10-16 22:40:50.959 INFO net.spy.memcached.MemcachedConnection: >> Added {QA sa=ocuic32.research/192.168.136.36:11211, #Rops=0, #Wops=0, >> #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect >> queue >> Memchaced flushed ... >> Cache loader created ... >> 2010-10-16 22:40:50.989 INFO net.spy.memcached.MemcachedConnection: >> Connection state changed for sun.nio.ch.selectionkeyi...@25fa1bb6 >> map1 loaded >> map2 loaded >> java.lang.OutOfMemoryError: Java heap space >> at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:51) >> at java.lang.StringCoding$StringEncoder.<init>(StringCoding.java:215) >> at java.lang.StringCoding$StringEncoder.<init>(StringCoding.java:207) >> at java.lang.StringCoding.encode(StringCoding.java:266) >> at java.lang.String.getBytes(String.java:947) >> at net.spy.memcached.KeyUtil.getKeyBytes(KeyUtil.java:20) >> at >> net.spy.memcached.protocol.ascii.OperationImpl.setArguments(OperationImpl.java >> :86) >> at >> net.spy.memcached.protocol.ascii.BaseStoreOperationImpl.initialize(BaseStoreOp >> erationImpl.java:48) >> at >> net.spy.memcached.MemcachedConnection.addOperation(MemcachedConnection.java:60 >> 1) >> at >> net.spy.memcached.MemcachedConnection.addOperation(MemcachedConnection.java:58 >> 2) >> at net.spy.memcached.MemcachedClient.addOp(MemcachedClient.java:277) >> at >> net.spy.memcached.MemcachedClient.asyncStore(MemcachedClient.java:314) >> at net.spy.memcached.MemcachedClient.set(MemcachedClient.java:691) >> at net.spy.memcached.util.CacheLoader.push(CacheLoader.java:92) >> at net.spy.memcached.util.CacheLoader.loadData(CacheLoader.java:61) >> at net.spy.memcached.util.CacheLoader.loadData(CacheLoader.java:75) >> at MemchacedLoad.mapload(MemchacedLoad.java:90) >> at MemchacedLoad.main(MemchacedLoad.java:159) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j >> ava:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.util.RunJar.main(RunJar.java:165) >> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) >> >> Shi >> >> On Sat, Oct 16, 2010 at 10:23 PM, Dustin <[email protected]> wrote: >>> >>> On Oct 16, 6:45 pm, Shi Yu <[email protected]> wrote: >>>> I have also tried the CacheLoader API, it pops a java GC error. The >>>> thing I haven't tried is to separate 6 million records into several >>>> objects and try CacheLoader. But I don't think it should be that >>>> fragile and complicated. I have spent a whole day on this issue, now I >>>> just rely the hybrid approach to finish the work. But I would be very >>>> interested to hear any solution to solve this issue. >>> >>> I cannot make any suggestions as to why you got an error without >>> knowing what you did and what error you got. >>> >>> I would not expect the same that you posted to work without a lot of >>> memory, tweaking, and a very fast network since you're just filling an >>> output queue as fast as java will allow you. >> >>> You didn't share any code using CacheLoader, so I can only guess as >>> to how you may have used it to get an error. There are three >>> different methods you can use -- did you try to create a map with six >>> million values and then pass it to the CacheLoader API (that would >>> very likely give you an out of memory error). >> >> >>> >>> You could also be taxing the GC considerably by converting integers >>> to strings to compute modulus if your jvm doesn't do proper escape >>> analysis. >>> >>> I can assure you there's no magic that will make it fail to load six >>> million records through the API as long as you account for the >>> realities of your network (which CacheLoader does for you) and your >>> available memory. > >
