Still happening right now. I can repro pretty easily: schedule ~10
image processing tasks in a queue, and 1 to 3 often fail due to that
re-entrancy bug. Then after they retry automatically and after a few
times usually go through.

5 instances idle / running, no load on the app: only 1 user currently.

It really looks as if the scheduler when getting the callback from the
EC2 box is trying to serve it from the same instance from which
urlfetch() is running: there must be a bug where the scheduler thinks
the instance is not busy or something.

If you look at the distribution of requests, something's clearly wrong
as 1 instance gets the majority of the traffic.

QPS*    Latency*        Requests        Errors  Age     Memory  Availability
0.000   0.0 ms  7       0       0:03:30 17.7 MBytes     Dynamic
0.000   0.0 ms  152     1       0:26:56 68.6 MBytes     Dynamic
0.000   0.0 ms  6       0       0:32:22 23.9 MBytes     Dynamic
0.000   0.0 ms  4       0       0:32:12 24.6 MBytes     Dynamic
0.000   0.0 ms  3       0       0:32:09 14.5 MBytes     Dynamic

On Sep 9, 1:43 pm, Jon McAlister <[email protected]> wrote:
> Ok, let me know if it recurs.
>
>
>
>
>
>
>
> On Wed, Sep 7, 2011 at 3:07 PM, Pol <[email protected]> wrote:
> > Hi Jon,
>
> > Great!
>
> > everpix-alpha
>
> > I still see what appears to be the bug now and then, but I don't have
> > a reproducible case just waiting in a task queue anymore.
>
> > On Sep 7, 1:19 pm, Jon McAlister <[email protected]> wrote:
> >> Hi Pol,
>
> >> I think I have a change that will fix this, which should be available
> >> once 1.5.4 is pushed out. What is the app-id here?
>
> >> On Mon, Sep 5, 2011 at 10:41 AM, Pol <[email protected]> wrote:
> >> > Hi,
>
> >> > Our app settings are as follow:
> >> > - Python +HRD
> >> > - Max Idle Instances: ( 2 )
> >> > - Min Pending Latency: ( 100ms )
> >> > As of right now, there are 3 instances alive.
>
> >> > Without going too much into details, we have GAE integrated with EC2
> >> > on which we run remote image processing tools. The tools are called
> >> > directly using HTTP GETs from GAE and they returned their results as
> >> > JSON (with gzip content encoding).
>
> >> > There are currently 3 tasks in the processing queue on GAE
> >> > continuously failing: the urlfetch() calls to the EC2 tool reach the
> >> > 10 seconds timeout and bail. What doesn't make sense is that calling
> >> > the EC2 tool directly using curl from random machines succeeds in less
> >> > than 1 second.
>
> >> > But here's the trick: under certain circumstances, the EC2 tool will
> >> > call back to GAE (HEAD request that does a single db.get()) to check
> >> > if the image has already been processed and this happens for these 3
> >> > stuck tasks.
>
> >> > If calling the EC2 tool from the command line and curl, we have the
> >> > normal behavior:
> >> > - EC2 tool retrieves image from arbitrary URL and computes a hash
> >> > - EC2 tool does a HEAD call to GAE passing this hash to see if image
> >> > has been already processed
> >> >  - If yes, return empty JSON
> >> >  - If no, process image and return full JSON
> >> > This takes about 1 second.
>
> >> > The exact same call done from GAE produces this behavior:
> >> > - EC2 tool retrieves image from arbitrary URL and computes a hash
> >> > - EC2 tool does a HEAD call to GAE passing this hash to see if image
> >> > has been already processed
> >> >  -> HEAD call hangs  <--- RE-ENTRANCY /DEADLOCKBUG in GAE
> >> >  -> urlfetch() from GAE to EC2 reaches 10 seconds deadline and
> >> > aborts  <-- BREAKSDEADLOCK
> >> >  -> HEAD call finally returns
> >> > - EC2 tool completes normally
>
> >> > GAE logs confirm the bug:
>
> >> > HEAD call from EC2 / curl origina
> >> > 2011-09-05 10:19:52.502 /api/has_backing?
> >> > bid=90e794f348ac76520076f5d02bc67c87c8a9185b8d36affe8377e73fe4820703
> >> > 200 368ms 48cpu_ms 8api_cpu_ms 0kb Everpix-Processor
>
> >> > HEAD call from EC2 / GAE origin
> >> > 2011-09-05 10:20:44.670 /api/has_backing?
> >> > bid=90e794f348ac76520076f5d02bc67c87c8a9185b8d36affe8377e73fe4820703
> >> > 200 9712ms 48cpu_ms 8api_cpu_ms 0kb Everpix-Processor
> >> > 2011-09-05 10:20:44.547 /task/import_photo 500 10348ms 28cpu_ms
> >> > 8api_cpu_ms 0kb AppEngine-Google; (+http://code.google.com/appengine)
> >> > (see how the HEAD /api/has_backing call hangs for almost 10 seconds
> >> > and only returns *after* /task/import_photo and its urlfetch() call to
> >> > EC2 has aborted)
>
> >> > And finally, AppStats confirms that it's not the head() Python
> >> > execution itself that's hanging:
>
> >> > (1) 2011-09-05 09:16:06.843 "HEAD /api/has_backing?
> >> > bid=3bc4aeb08e01d3ba4bfab3282d2a198984a4fc1fab2ad9d1e8a39ee3cddd14da"
> >> > 200 real=227ms cpu=24ms api=8ms overhead=0ms (1 RPC)
> >> > (2) 2011-09-05 09:15:56.422 "POST /task/import_photo" 500 real=10002ms
> >> > cpu=33ms api=8ms overhead=0ms (3 RPCs)
> >> > (3) 2011-09-05 09:15:49.328 "HEAD /api/has_backing?
> >> > bid=90e794f348ac76520076f5d02bc67c87c8a9185b8d36affe8377e73fe4820703"
> >> > 200 real=297ms cpu=21ms api=8ms overhead=0ms (1 RPC)
>
> >> > This issue is currently 100% reproducible.
>
> >> > - Pol
>
> >> > --
> >> > You received this message because you are subscribed to the Google 
> >> > Groups "Google App Engine" group.
> >> > To post to this group, send email to [email protected].
> >> > To unsubscribe from this group, send email to 
> >> > [email protected].
> >> > For more options, visit this group 
> >> > athttp://groups.google.com/group/google-appengine?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "Google App Engine" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to 
> > [email protected].
> > For more options, visit this group 
> > athttp://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to