Re: [R] Finding the right url for RCurl
On 8/4/2010 2:07 PM, AndrewPage wrote: Hi all, I am using RCurl to try and download data from a website, but I'm having trouble finding out what URL to use. Here is the site: http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX See how in the upper right, above the displayed sheet, there's a link to download the data as a .csv file? When I hit copy url and paste into getURL in R, it doesn't work. That's no surprise because there isn't a URL in what gets pasted. I was just wondering if there's any way around this. Thanks in advance, Andrew I looked at the page. The link you mentioned runs some javascript which alters some values in a form and posts that form, the result of which is the CSV file. There is not a simple URL that points to the file. I don't know if RCurl can post forms, but if it can you may be able to mimic the form. The structure of the form starts on line 191 of the page source (or search for aspnetForm) and appropriate values for __EVENTTARGET are given in the doPostBack call on line 258. Some understanding of HTML and HTTP may be necessary to know what is going on. I don't know if this would work or not. Also, the site has not made it easy to directly download the CSV file. That may be intentional. The Terms Services of the site may have something to say about doing this as well. -- Brian Diggs Senior Research Associate, Department of Surgery, Oregon Health Science University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding the right url for RCurl
Try this: library(XML) readHTMLTable(' http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX', which = 13, header = TRUE) On Wed, Aug 4, 2010 at 6:07 PM, AndrewPage savejar...@yahoo.com wrote: Hi all, I am using RCurl to try and download data from a website, but I'm having trouble finding out what URL to use. Here is the site: http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX See how in the upper right, above the displayed sheet, there's a link to download the data as a .csv file? When I hit copy url and paste into getURL in R, it doesn't work. That's no surprise because there isn't a URL in what gets pasted. I was just wondering if there's any way around this. Thanks in advance, Andrew -- View this message in context: http://r.789695.n4.nabble.com/Finding-the-right-url-for-RCurl-tp2314163p2314163.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding the right url for RCurl
Thanks for the help so far-- one interesting thing about this particular page is that the data displayed on the website actually differs from the data you can access with the download link. The XML package command works, but the table it produces in R has the following column names: x1 = readHTMLTable(http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX;, which + = 13, header = TRUE) colnames(x1) [1] Coupon Rate Maturity Date Ratingâ\u0080 % Weight Warning message: it is not known that wchar_t is Unicode on this platform whereas the .csv file you can get with the link has 8 columns, including a PositionDate column, a Shares column, etc. that aren't present on the page's table. What makes this even more confusing is that the XML table contains MORE information than is presented on the page, such as Maturity Date. What I'm really looking for is a way to access the .csv file, so I doubt that reading info from the webpage will be sufficient seeing as it seems to be displaying different data. --Andrew -- View this message in context: http://r.789695.n4.nabble.com/Finding-the-right-url-for-RCurl-tp2314163p2315461.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.