Hey Taengoo, It seems as though you may have stumbled on a valid Feature Request in the making. In the docs, it's explained that serving content-encoding: gzip responses is done based on a combination of User-Agent and Accept-Encoding headers <https://cloud.google.com/appengine/kb/general#compression>, however it appears that the Twitterbot UA string doesn't pass the test.
Attached is a .tar.gz containing an example app you can deploy, and a script you can use, to test this behaviour on App Engine. If you change the application id in app.yaml inside the app/ directory, you can deploy the app. At that point, you'll want to run : ./curl-uas.sh 1.testheaders.APPID.appspot.com Where your APPID will be an actual app id. This script runs through the user-agents in user-agents.txt, which contain the most statistically-popular UA strings on the web at the moment, along with several test values. You'll notice that your observations are replicated for Twitterbot-style UA strings, while the special User-Agent "gzip", as explained in the docs, can force compression. I think you should open a Feature Request thread in the public issue tracker <http://code.google.com/p/googleappengine/issues/list> to either have the Twitterbot UA included in the list of those which can accept gzip if they request it via Accept-Encoding, or to simply have the Accept-Encoding header be respected. If possible, you could modify your Twitterbot to use UA "gzip", in order to simply get it working today. Best wishes, Nick On Monday, June 29, 2015 at 6:27:04 AM UTC-4, Taengoo Taengstagram wrote: > > I've noticed for when Twitterbot crawls my app on GAE, the response does > not appear to be gzipped (as seen by the response bytes size in GAE logs). > I've tested this with other apps deployed on the *.appspot.com, for > example https://ga-dev-tools.appspot.com/. > > To illustrate, I'm using a test user agent "Twitterbot/9.0", although the > actual Twitter user agent is "Twitterbot/1.0". > > # Test case 1: With a generic Mozilla useragent Mozilla/9.0 + gzip > headers, response returned is gzipped > $ curl 'https://ga-dev-tools.appspot.com/' -H 'Accept-Encoding: gzip, > deflate, sdch' --compressed -A 'Mozilla/9.0' -i > > HTTP/1.1 200 OK > Content-Type: text/html; charset=utf-8 > Cache-Control: no-cache > Content-Encoding: gzip > Vary: Accept-Encoding > Date: Mon, 29 Jun 2015 10:11:35 GMT > Server: Google Frontend > Alternate-Protocol: 443:quic,p=1 > Transfer-Encoding: chunked > > # Test case 2: With a Twitterbot useragent Twitterbot/9.0 + gzip headers, > response returned is not gzipped > $ curl 'https://ga-dev-tools.appspot.com/' -H 'Accept-Encoding: gzip, > deflate, sdch' --compressed -A 'Twitterbot/9.0' -i > > HTTP/1.1 200 OK > Content-Type: text/html; charset=utf-8 > Cache-Control: no-cache > Date: Mon, 29 Jun 2015 10:12:06 GMT > Server: Google Frontend > Content-Length: 7956 > Alternate-Protocol: 443:quic,p=1 > > # Test case 3: With a Twitterbot useragent Twitterbot/9.0 + no other > headers, response returned is not gzipped > $ curl 'https://ga-dev-tools.appspot.com/' -A 'Mozilla/9.0' -i > > HTTP/1.1 200 OK > Content-Type: text/html; charset=utf-8 > Cache-Control: no-cache > Date: Mon, 29 Jun 2015 10:13:17 GMT > Server: Google Frontend > Content-Length: 7956 > Alternate-Protocol: 443:quic,p=1 > > > You will noticed that GAE is returning identical responses for test #2 > (Twitterbot) and #3 (uncompressed request). This is unexpected and rather > puzzling. Any idea why? > > > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/62997f29-c562-4957-abef-630f71863512%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
test-ua-content-encoding.tar.gz
Description: Binary data
