Kelvin.

This is year 2010 and computer programs should not be that fragile.
And I believe my code is just a fast simple toy problem trying to find
out why I failed too many times in my real problem. Before I post my
problem, I checked and searched many documents, I read through the API
and there is no clear instruction telling me what should I do to
prevent such an error. I don't have time to bug an API on purpose, I
am doing NLP pos tagging and I have exactly 6 million stemmed word to
store. Fortunately or unlucky to me, that number exactly triggers the
failure so I had to spend 6 hours finding out the reason. Actually spy
client is the first API I tried, as I pointed out in my first post, it
is fast, however, there is an error. I don't think for a normal
end-product API, the memory leak issue should be considered by the
user.

Shi

On Sun, Oct 17, 2010 at 1:11 AM, Kelvin Edmison <[email protected]> wrote:
> Shi,
>
>  Be careful when you start calling it a buggy API, especially as you
> present the quality of code that you did in your initial test case.  Your
> bugs-per-LOC was pretty high.
>
> However, it seems that you did in fact stumble into a bug in the Spy client,
> but only because you did no error checking at all.
>
> Dustin,
>  while trying to re-create this problem and point out the various errors in
> his code, I found that, in his test case, if I did not call Future.get() to
> verify the result of the set, the spyMemcached client leaked memory.  Given
> that the Spymemcached wiki says that fire-and-forget is a valid mode of
> usage, this appears to be a bug.
>
> Here's my testcase against spymemcached-2.5.jar:
> 'java -cp .:./memcached-2.5.jar FutureResultLeak true' leaks memory and will
> eventually die OOM.
> ' java -cp .:./memcached-2.5.jar FutureResultLeak false' does not leak and
> runs to completion.
>
> Here's the code. It's based on Shi's testcase so he and I now share the
> blame for code quality :)
>
> ----------------------
> import net.spy.memcached.*;
> import java.lang.*;
> import java.net.*;
> import java.util.concurrent.*;
>
> public class FutureResultLeak {
>
>  public static void main(String[] args) throws Exception {
>    boolean leakMemory = false;
>    if (args.length >= 1) {
>      leakMemory = Boolean.valueOf(args[0]);
>    }
>    System.out.println("Testcase will " + (leakMemory ? "leak memory" : "not
> leak memory"));
>    MemcachedClient mc=new MemcachedClient(new
> InetSocketAddress("localhost", 11211));
>    mc.flush();
>    System.out.println("Memcached flushed ...");
>    int count = 0;
>    int logInterval = 100000;
>    int itemExpiryTime = 600;
>    long intervalStartTime = System.currentTimeMillis();
>    for(int i=0;i<6000000;i++){
>      String a = "String"+i;
>      String b = "Value"+i;
>
>
>      Future<Boolean> f =mc.add(a,itemExpiryTime, b);
>      if (!leakMemory) {
>        f.get();
>      }
>      count++;
>      if (count % logInterval == 0) {
>        long elapsed = System.currentTimeMillis() - intervalStartTime;
>        double itemsPerSec = logInterval*1.0/elapsed;
>        System.out.println(count+ " elements added in " + elapsed + " (" +
> itemsPerSec + " per sec).");
>        intervalStartTime = System.currentTimeMillis();
>      }
>    }
>
>    System.out.println("done "+ count +" records inserted");
>    mc.shutdown(60, TimeUnit.SECONDS);
>  }
> }
> ----------------------
>
>
> Regards,
>  Kelvin
>
>
>
>
> On 17/10/10 12:28 AM, "Shi Yu" <[email protected]> wrote:
>
>> And I run with the following java command on a 64-bit Unix machine
>> which has 8G memory. I separate the Map into three parts, still
>> failed. TBH I think there is some bug in the spymemcached input
>> method. With Whalin's API there is no any problem with only 2G heap
>> size, just a little bit slower but thats definitely better than being
>> stuck for 6 hours on a bugged API.
>>
>> java -Xms4G -Xmx4G -classpath ./lib/spymemcached-2.5.jar Memcaceload
>>
>> Here is the error output:
>>
>> 2010-10-16 22:40:50.959 INFO net.spy.memcached.MemcachedConnection:
>> Added {QA sa=ocuic32.research/192.168.136.36:11211, #Rops=0, #Wops=0,
>> #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect
>> queue
>> Memchaced flushed ...
>> Cache loader created ...
>> 2010-10-16 22:40:50.989 INFO net.spy.memcached.MemcachedConnection:
>> Connection state changed for sun.nio.ch.selectionkeyi...@25fa1bb6
>> map1 loaded
>> map2 loaded
>> java.lang.OutOfMemoryError: Java heap space
>>         at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:51)
>>         at java.lang.StringCoding$StringEncoder.<init>(StringCoding.java:215)
>>         at java.lang.StringCoding$StringEncoder.<init>(StringCoding.java:207)
>>         at java.lang.StringCoding.encode(StringCoding.java:266)
>>         at java.lang.String.getBytes(String.java:947)
>>         at net.spy.memcached.KeyUtil.getKeyBytes(KeyUtil.java:20)
>>         at
>> net.spy.memcached.protocol.ascii.OperationImpl.setArguments(OperationImpl.java
>> :86)
>>         at
>> net.spy.memcached.protocol.ascii.BaseStoreOperationImpl.initialize(BaseStoreOp
>> erationImpl.java:48)
>>         at
>> net.spy.memcached.MemcachedConnection.addOperation(MemcachedConnection.java:60
>> 1)
>>         at
>> net.spy.memcached.MemcachedConnection.addOperation(MemcachedConnection.java:58
>> 2)
>>         at net.spy.memcached.MemcachedClient.addOp(MemcachedClient.java:277)
>>         at
>> net.spy.memcached.MemcachedClient.asyncStore(MemcachedClient.java:314)
>>         at net.spy.memcached.MemcachedClient.set(MemcachedClient.java:691)
>>         at net.spy.memcached.util.CacheLoader.push(CacheLoader.java:92)
>>         at net.spy.memcached.util.CacheLoader.loadData(CacheLoader.java:61)
>>         at net.spy.memcached.util.CacheLoader.loadData(CacheLoader.java:75)
>>         at MemchacedLoad.mapload(MemchacedLoad.java:90)
>>         at MemchacedLoad.main(MemchacedLoad.java:159)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j
>> ava:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>>         at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>         at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>
>> Shi
>>
>> On Sat, Oct 16, 2010 at 10:23 PM, Dustin <[email protected]> wrote:
>>>
>>> On Oct 16, 6:45 pm, Shi Yu <[email protected]> wrote:
>>>> I have also tried the CacheLoader API, it pops a java GC error. The
>>>> thing I haven't tried is to separate 6 million records into several
>>>> objects and try CacheLoader. But I don't think it should be that
>>>> fragile and complicated. I have spent a whole day on this issue, now I
>>>> just rely the hybrid approach to finish the work. But I would be very
>>>> interested to hear any solution to solve this issue.
>>>
>>>  I cannot make any suggestions as to why you got an error without
>>> knowing what you did and what error you got.
>>>
>>>  I would not expect the same that you posted to work without a lot of
>>> memory, tweaking, and a very fast network since you're just filling an
>>> output queue as fast as java will allow you.
>>
>>>  You didn't share any code using CacheLoader, so I can only guess as
>>> to how you may have used it to get an error.  There are three
>>> different methods you can use -- did you try to create a map with six
>>> million values and then pass it to the CacheLoader API (that would
>>> very likely give you an out of memory error).
>>
>>
>>>
>>>  You could also be taxing the GC considerably by converting integers
>>> to strings to compute modulus if your jvm doesn't do proper escape
>>> analysis.
>>>
>>>  I can assure you there's no magic that will make it fail to load six
>>> million records through the API as long as you account for the
>>> realities of your network (which CacheLoader does for you) and your
>>> available memory.
>
>

Reply via email to