What you are doing wrong is both trying yourself and asking others to
violate Google's Terms of Service and (amongst other things) get your
IP banned along with anyone who aids you (or worse). Please don't.
Just because something can be done does not mean it should be done.

On Tue, May 24, 2016 at 11:21 AM, Kumar Gauraw <string.gau...@gmail.com> wrote:
> Hello Experts,
>
> I am trying to scrap data from Google news for a particular topic using XML
> and Curl Package of R. I am able to extract the summary part of the news
> through *XPath* but in a similar way, I am trying to extract title and
> Links of news which is not working.Please note this work is just for POC
> purpose and I would make maximum of 500 requests per day so that Google TOS
> remains intact.
>
>
> library(XML)
>
> library(RCurl)
>
> getGoogleURL <- function(search.term, domain = '.co.in', quotes=TRUE)
>
> {
>
>   search.term <- gsub(' ', '%20', search.term)
>
>   if(quotes) search.term <- paste('%22', search.term, '%22', sep='')
>
>   getGoogleURL <- paste('http://www.google', domain,
> '/search?hl=en&gl=in&tbm=nws&authuser=0&q=',search.term, sep='')
>
> }
>
> search.term <- "IPL 2016"
>
> quotes <- "FALSE"
>
> search.url <- getGoogleURL(search.term=search.term, quotes=quotes)
>
> getGoogleSummary <- function(google.url) {
>
>   doc <- getURL(google.url, httpheader = c("User-Agent" = "R(2.10.0)"))
>
>   html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function(...){})
>
>   nodes <- getNodeSet(html, "//div[@class='st']")
>
>   return(sapply(nodes, function(x) x <- xmlValue(x)))
>
> }
>
> *#Problem is with this part of code*
>
> getGoogleTitle <- function(google.url) {
>
>   doc <- getURL(google.url, httpheader = c("User-Agent" = "R(2.10.0)"))
>
>   html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function(...){})
>
>  * nodes <- getNodeSet(html, "//a[@class='l _HId']")*
>
>   return(sapply(nodes, function(x) x <- xmlValue(x)))
>
> }
>
> Kindly help me to understand where I am getting wrong so that I can rectify
> the code and get the correct output.
>
> Thank you.
>
> With Regards,
> Kumar Gauraw
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to