Hello Experts, I am trying to scrap data from Google news for a particular topic using XML and Curl Package of R. I am able to extract the summary part of the news through *XPath* but in a similar way, I am trying to extract title and Links of news which is not working.Please note this work is just for POC purpose and I would make maximum of 500 requests per day so that Google TOS remains intact.
library(XML) library(RCurl) getGoogleURL <- function(search.term, domain = '.co.in', quotes=TRUE) { search.term <- gsub(' ', '%20', search.term) if(quotes) search.term <- paste('%22', search.term, '%22', sep='') getGoogleURL <- paste('http://www.google', domain, '/search?hl=en&gl=in&tbm=nws&authuser=0&q=',search.term, sep='') } search.term <- "IPL 2016" quotes <- "FALSE" search.url <- getGoogleURL(search.term=search.term, quotes=quotes) getGoogleSummary <- function(google.url) { doc <- getURL(google.url, httpheader = c("User-Agent" = "R(2.10.0)")) html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function(...){}) nodes <- getNodeSet(html, "//div[@class='st']") return(sapply(nodes, function(x) x <- xmlValue(x))) } *#Problem is with this part of code* getGoogleTitle <- function(google.url) { doc <- getURL(google.url, httpheader = c("User-Agent" = "R(2.10.0)")) html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function(...){}) * nodes <- getNodeSet(html, "//a[@class='l _HId']")* return(sapply(nodes, function(x) x <- xmlValue(x))) } Kindly help me to understand where I am getting wrong so that I can rectify the code and get the correct output. Thank you. With Regards, Kumar Gauraw [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.