On Oct 16, 11:24 pm, Shi Yu <[email protected]> wrote:
> Kelvin.
>
> This is year 2010 and computer programs should not be that fragile.
> And I believe my code is just a fast simple toy problem trying to find
> out why I failed too many times in my real problem. Before I post my
> problem, I checked and searched many documents, I read through the API
> and there is no clear instruction telling me what should I do to
> prevent such an error. I don't have time to bug an API on purpose, I
> am doing NLP pos tagging and I have exactly 6 million stemmed word to
> store. Fortunately or unlucky to me, that number exactly triggers the
> failure so I had to spend 6 hours finding out the reason. Actually spy
> client is the first API I tried, as I pointed out in my first post, it
> is fast, however, there is an error. I don't think for a normal
> end-product API, the memory leak issue should be considered by the
> user.

  I agree, but I don't see anything that I would consider a memory
leak.  I see a few things generating massive amounts of data and
storing it in memory faster than it can get processed.

  Unfortunately, attempting to make this easier in the common case has
made it more confusing.  A while back, instead of slowing down queue
insertion, it would just fail and tell you you were overflowing the
queue.  That made it easy to understand when and how to back off.  Now
it just lets you use up memory on the client side excessively by
keeping the op queue completely full which, in turn, keeps the read
and write operations completely full which means you need a tremendous
amount of memory to do anything at all.

  I'm sure you can see that using memcached as a write-only data store
is a bit of an edge case.  It's come up enough for CacheLoader to have
been written in the first place, but a typical application is reading
more than it's writing.  This situation you're running into doesn't
happen if your filler thread has as much as 0.01% reads mixed in with
its writes (or even just checks the return occasionally).

  If you do one of the following, you will be fine:

 1) Set op queue max block time to 0 and
     a) Build an iterator of your data and use CacheLoader.load with
that iterator
     b) Use CacheLoader.push
 2) Every one or two hundred thousand operations, check a return
value.

Reply via email to