I've logged it as issue #12104 https://code.google.com/p/googleappengine/issues/detail?id=12104
Thanks for pointing out the presence of a whitelist. This explains why I've seen uncompressed responses in the logs to possibly lesser known mobile useragents such as custom embedded webviews. This is unfortunate when it is precisely these mobile devices which will stand to gain the most from compressed content. Also to note, application/site owners are rarely in a position to request that crawlers/users modify their user agent string to comply with such a specific requirement for GAE. On Tuesday, June 30, 2015 at 3:27:59 AM UTC+8, Nick (Cloud Platform Support) wrote: > > Hey Taengoo, > > It seems as though you may have stumbled on a valid Feature Request in the > making. In the docs, it's explained that serving content-encoding: gzip > responses is done based on a combination of User-Agent and Accept-Encoding > headers <https://cloud.google.com/appengine/kb/general#compression>, > however it appears that the Twitterbot UA string doesn't pass the test. > > Attached is a .tar.gz containing an example app you can deploy, and a > script you can use, to test this behaviour on App Engine. If you change the > application id in app.yaml inside the app/ directory, you can deploy the > app. At that point, you'll want to run : > > ./curl-uas.sh 1.testheaders.APPID.appspot.com > > Where your APPID will be an actual app id. > > This script runs through the user-agents in user-agents.txt, which contain > the most statistically-popular UA strings on the web at the moment, along > with several test values. You'll notice that your observations are > replicated for Twitterbot-style UA strings, while the special User-Agent > "gzip", as explained in the docs, can force compression. > > I think you should open a Feature Request thread in the public issue > tracker <http://code.google.com/p/googleappengine/issues/list> to either > have the Twitterbot UA included in the list of those which can accept gzip > if they request it via Accept-Encoding, or to simply have the > Accept-Encoding header be respected. > > If possible, you could modify your Twitterbot to use UA "gzip", in order > to simply get it working today. > > Best wishes, > > Nick > > On Monday, June 29, 2015 at 6:27:04 AM UTC-4, Taengoo Taengstagram wrote: >> >> I've noticed for when Twitterbot crawls my app on GAE, the response does >> not appear to be gzipped (as seen by the response bytes size in GAE logs). >> I've tested this with other apps deployed on the *.appspot.com, for >> example https://ga-dev-tools.appspot.com/. >> >> To illustrate, I'm using a test user agent "Twitterbot/9.0", although >> the actual Twitter user agent is "Twitterbot/1.0". >> >> # Test case 1: With a generic Mozilla useragent Mozilla/9.0 + gzip >> headers, response returned is gzipped >> $ curl 'https://ga-dev-tools.appspot.com/' -H 'Accept-Encoding: gzip, >> deflate, sdch' --compressed -A 'Mozilla/9.0' -i >> >> HTTP/1.1 200 OK >> Content-Type: text/html; charset=utf-8 >> Cache-Control: no-cache >> Content-Encoding: gzip >> Vary: Accept-Encoding >> Date: Mon, 29 Jun 2015 10:11:35 GMT >> Server: Google Frontend >> Alternate-Protocol: 443:quic,p=1 >> Transfer-Encoding: chunked >> >> # Test case 2: With a Twitterbot useragent Twitterbot/9.0 + gzip headers, >> response returned is not gzipped >> $ curl 'https://ga-dev-tools.appspot.com/' -H 'Accept-Encoding: gzip, >> deflate, sdch' --compressed -A 'Twitterbot/9.0' -i >> >> HTTP/1.1 200 OK >> Content-Type: text/html; charset=utf-8 >> Cache-Control: no-cache >> Date: Mon, 29 Jun 2015 10:12:06 GMT >> Server: Google Frontend >> Content-Length: 7956 >> Alternate-Protocol: 443:quic,p=1 >> >> # Test case 3: With a Twitterbot useragent Twitterbot/9.0 + no other >> headers, response returned is not gzipped >> $ curl 'https://ga-dev-tools.appspot.com/' -A 'Mozilla/9.0' -i >> >> HTTP/1.1 200 OK >> Content-Type: text/html; charset=utf-8 >> Cache-Control: no-cache >> Date: Mon, 29 Jun 2015 10:13:17 GMT >> Server: Google Frontend >> Content-Length: 7956 >> Alternate-Protocol: 443:quic,p=1 >> >> >> You will noticed that GAE is returning identical responses for test #2 >> (Twitterbot) and #3 (uncompressed request). This is unexpected and rather >> puzzling. Any idea why? >> >> >> -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/38af3d6e-7539-41b0-b2f9-d23fdadafa2b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
