Hi,

I'm trying to use libcurl to download the RSS feed from Google News. The
default feed given to me is

http://news.google.com/news?pz=1&cf=all&ned=uk&hl=en&topic=h&num=3&output=rss

When I try to get it using libcurl the server gives me a HTML webpage (200
response). In CURLOPT_VERBOSE mode I get:

* About to connect() to news.google.com port 80 (#0)
*   Trying 74.125.79.99... * Connected to news.google.com (74.125.79.99) port 
80 (#0)                                                                         
  
> GET 
> /news?pz=1&cf=all&ned=uk&hl=en&topic=h&num=3&output=rss
>  HTTP/1.1
User-Agent: myapplication/1.0
Host: news.google.com
Accept: */*

< HTTP/1.1 200 OK
< Content-Type: text/html; charset=UTF-8

I can get the RSS feed via the curl command line no problems,

curl -i -A "myapplication/1.0" 
"http://news.google.com/news?pz=1&cf=all&ned=uk&hl=en&topic=h&num=3&output=rss";

This gives XML (also a 200 response).

You'll notice that Google doesn't like to be scraped, hence setting a user
agent string. I'm thinking that they detect my application as a scraper so
they serve me the HTML. Another possibility is that the URL is not formed
properly. My application is passing &amp; to libcurl instead of &.

The curl tool can get the XML using this URL and the same user agent
string as my application so I don't see why I can't get it.

I tried looking at the output from curl using --libcurl but can't see any
reason why my application is different.

Here is the code I am using:

  handle = curl_easy_init();

  // Set up options
  curl_easy_setopt(handle, CURLOPT_URL, url.ascii());
#if DEBUG
  curl_easy_setopt(handle, CURLOPT_VERBOSE, 1);
#endif
  curl_easy_setopt(handle, CURLOPT_USERAGENT, useragent.ascii());
  curl_easy_setopt(handle, CURLOPT_TIMEOUT, timeout);
  if(!proxy.isEmpty())
    curl_easy_setopt(handle, CURLOPT_PROXY, proxy.ascii());
  curl_easy_setopt(handle, CURLOPT_FOLLOWLOCATION, 1);

It's probably a case of not seeing the wood for the trees.
What am I doing wrong ?



      

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html

Reply via email to