Hi, I'm trying to use libcurl to download the RSS feed from Google News. The default feed given to me is
http://news.google.com/news?pz=1&cf=all&ned=uk&hl=en&topic=h&num=3&output=rss When I try to get it using libcurl the server gives me a HTML webpage (200 response). In CURLOPT_VERBOSE mode I get: * About to connect() to news.google.com port 80 (#0) * Trying 74.125.79.99... * Connected to news.google.com (74.125.79.99) port 80 (#0) > GET > /news?pz=1&cf=all&ned=uk&hl=en&topic=h&num=3&output=rss > HTTP/1.1 User-Agent: myapplication/1.0 Host: news.google.com Accept: */* < HTTP/1.1 200 OK < Content-Type: text/html; charset=UTF-8 I can get the RSS feed via the curl command line no problems, curl -i -A "myapplication/1.0" "http://news.google.com/news?pz=1&cf=all&ned=uk&hl=en&topic=h&num=3&output=rss" This gives XML (also a 200 response). You'll notice that Google doesn't like to be scraped, hence setting a user agent string. I'm thinking that they detect my application as a scraper so they serve me the HTML. Another possibility is that the URL is not formed properly. My application is passing & to libcurl instead of &. The curl tool can get the XML using this URL and the same user agent string as my application so I don't see why I can't get it. I tried looking at the output from curl using --libcurl but can't see any reason why my application is different. Here is the code I am using: handle = curl_easy_init(); // Set up options curl_easy_setopt(handle, CURLOPT_URL, url.ascii()); #if DEBUG curl_easy_setopt(handle, CURLOPT_VERBOSE, 1); #endif curl_easy_setopt(handle, CURLOPT_USERAGENT, useragent.ascii()); curl_easy_setopt(handle, CURLOPT_TIMEOUT, timeout); if(!proxy.isEmpty()) curl_easy_setopt(handle, CURLOPT_PROXY, proxy.ascii()); curl_easy_setopt(handle, CURLOPT_FOLLOWLOCATION, 1); It's probably a case of not seeing the wood for the trees. What am I doing wrong ? ------------------------------------------------------------------- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
