Hi!
I'm fetching webpages and parse them. I have a weird behavior that
looks like GAE bug.
One of the pages retrieved has a totally different content.
I'm getting
http://www.gamer-district.com/modules/mod_realmcore/mod_realmcore.php
page, and most of the time it is retrieved incorrectly.
It can be that some stale/cathed version is returned.
I am logging http headers and page content.This only happens for this
particular url.
The logs when the content is incorrect:
2011-02-12 02:59:35.159
[keepeyeon/1.348283144444156083].<stdout>: 10:59:35.159 [pool-4-
thread-1][keepEyeOn] DEBUG wowpop.fetchers.UrlFetcher - Headers:
{server=[cloudflare-nginx], date=[Sat, 12 Feb 2011 10:44:31 GMT],
content-type=[text/html], connection=[keep-alive], x-powered-by=[PHP/
5.2.15], content-length=[1190], age=[904], x-google-cache-
control=[remote-cache-hit], via=[HTTP/1.1 GWA (remote cache hit)]}
2011-02-12 02:59:35.159
[keepeyeon/1.348283144444156083].<stdout>: 10:59:35.159 [pool-4-
thread-1][keepEyeOn] DEBUG wowpop.fetchers.UrlFetcher - fetched
500chars: <table style="width: 100%; border: 0; padding: 1px"> <tr>
<td><b style="color: #fff">Gamer District 7x</b></td> <td style="text-
align: right;"> <img src="http://www.gamer-district.com/modules/
mod_realmcore/wow_on.png"></td></tr></table><table style="width: 100%;
border: 0; padding: 3"> <tr> <td>Uptime:</td>
<td>1 hours 15
minutes</td> </tr> <tr> <td>Players
online:</td> <td><b>457</b>
<span style="color: #ADDFFF">187</font> / <span
style="color: #F62817">270</font></td>
While 5min before it retrieved ok. Log:
2011-02-12 02:54:34.244
[keepeyeon/1.348283144444156083].<stdout>: 10:54:34.243 [pool-4-
thread-1][keepEyeOn] DEBUG wowpop.fetchers.UrlFetcher - Headers:
{server=[cloudflare-nginx], date=[Sat, 12 Feb 2011 10:54:34 GMT],
content-type=[text/html], connection=[keep-alive], x-powered-by=[PHP/
5.2.15], set-
cookie=[__cfduid=db133a16aaad88bded9766b18448779691297508074;
expires=Mon, 23 Dec 2019 23:50:00 GMT; path=/; domain=.gamer-
district.com, __cfduid=db133a16aaad88bded9766b18448779691297508074;
expires=Mon, 23 Dec 2019 23:50:00 GMT; path=/; domain=.www.gamer-
district.com], x-google-cache-control=[remote-fetch], via=[HTTP/1.1
GWA]}
2011-02-12 02:54:34.244
[keepeyeon/1.348283144444156083].<stdout>: 10:54:34.244 [pool-4-
thread-1][keepEyeOn] DEBUG wowpop.fetchers.UrlFetcher - fetched
500chars: <table style="width: 100%; border: 0; padding: 1px"> <tr>
<td><b style="color: #fff">Gamer District 7x</b></td> <td style="text-
align: right;"> <img src="http://www.gamer-district.com/modules/
mod_realmcore/wow_on.png"></td></tr></table><table style="width: 100%;
border: 0; padding: 3"> <tr> <td>Uptime:</td>
<td>2 hours 35
minutes</td> </tr> <tr> <td>Players
online:</td> <td><b>1400</b>
<span style="color: #ADDFFF">630</font> / <span
style="color: #F62817">770</font></td
The headers are DIFFERENT and have different TIME, also note the
UPTIME value in the html. I'm not caching anything myself.
--
You received this message because you are subscribed to the Google Groups
"Google App Engine for Java" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine-java?hl=en.