Re: [R] Finding the right url for RCurl

2010-08-05 Thread Brian Diggs

On 8/4/2010 2:07 PM, AndrewPage wrote:


Hi all,

I am using RCurl to try and download data from a website, but I'm having
trouble finding out what URL to use.  Here is the site:

http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX

See how in the upper right, above the displayed sheet, there's a link to
download the data as a .csv file?  When I hit copy url and paste into
getURL in R, it doesn't work.  That's no surprise because there isn't a URL
in what gets pasted.  I was just wondering if there's any way around this.

Thanks in advance,

Andrew


I looked at the page.  The link you mentioned runs some javascript which 
alters some values in a form and posts that form, the result of which is 
the CSV file.  There is not a simple URL that points to the file.  I 
don't know if RCurl can post forms, but if it can you may be able to 
mimic the form.  The structure of the form starts on line 191 of the 
page source (or search for aspnetForm) and appropriate values for 
__EVENTTARGET are given in the doPostBack call on line 258.  Some 
understanding of HTML and HTTP may be necessary to know what is going on.


I don't know if this would work or not.  Also, the site has not made it 
easy to directly download the CSV file.  That may be intentional.  The 
Terms  Services of the site may have something to say about doing this 
as well.


--
Brian Diggs
Senior Research Associate, Department of Surgery, Oregon Health  
Science University


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding the right url for RCurl

2010-08-05 Thread Henrique Dallazuanna
Try this:

library(XML)
readHTMLTable('
http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX', which
= 13, header = TRUE)

On Wed, Aug 4, 2010 at 6:07 PM, AndrewPage savejar...@yahoo.com wrote:


 Hi all,

 I am using RCurl to try and download data from a website, but I'm having
 trouble finding out what URL to use.  Here is the site:

 http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX

 See how in the upper right, above the displayed sheet, there's a link to
 download the data as a .csv file?  When I hit copy url and paste into
 getURL in R, it doesn't work.  That's no surprise because there isn't a URL
 in what gets pasted.  I was just wondering if there's any way around this.

 Thanks in advance,

 Andrew
 --
 View this message in context:
 http://r.789695.n4.nabble.com/Finding-the-right-url-for-RCurl-tp2314163p2314163.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding the right url for RCurl

2010-08-05 Thread AndrewPage

Thanks for the help so far-- one interesting thing about this particular page
is that the data displayed on the website actually differs from the data you
can access with the download link.  The XML package command works, but the
table it produces in R has the following column names:



 x1 =
 readHTMLTable(http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX;,
 which 
+ = 13, header = TRUE)
 colnames(x1)
[1]   Coupon Rate   Maturity Date Ratingâ\u0080 %
Weight 
Warning message:
it is not known that wchar_t is Unicode on this platform 



 whereas the .csv file you can get with the link has 8 columns,
including a PositionDate column, a Shares column, etc. that aren't
present on the page's table.

What makes this even more confusing is that the XML table contains MORE
information than is presented on the page, such as Maturity Date.

What I'm really looking for is a way to access the .csv file, so I doubt
that reading info from the webpage will be sufficient seeing as it seems to
be displaying different data.

--Andrew


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Finding-the-right-url-for-RCurl-tp2314163p2315461.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.