[R] Reading in an XLS (really XML) file from website

2015-02-27 Thread Bos, Roger
All,

I am trying to read the SP 500 constituents from the iShares website using the 
following code:

   URL - http://www.ishares.com/us/239726/fund-download.dl;
   setInternet2(TRUE)
   download.file(url=URL, destfile=temp.xls)
   out - readWorksheetFromFile(file=temp.xls, sheet=Holdings, header=TRUE, 
startRow=13)

R returns the following error:

out - readWorksheetFromFile(file=temp.xls, sheet=Holdings, 
 header=TRUE, startRow=13)
Error: IllegalArgumentException (Java): Your InputStream was neither an OLE2 
stream, nor an OOXML stream
In addition: Warning message:
In download.file(url = URL, destfile = temp.xls) :
  downloaded length 1938303 != reported length 200

Upon further examination this is because the format is really XML.  Is there 
any way to get XLConnect or any other excel reader to read in an XML file?  I 
thought XML was for new Excel format.

Barring that, can we read in the file using the XML package? I tried the 
following code...

   require(XML)
   tmp - xmlParse(URL)

... but I get this error:

Opening and ending tag mismatch: Style line 14 and Style
Error: 1: Opening and ending tag mismatch: Style line 14 and Style

Thanks in advance for any help or hints,

Roger



***
This message and any attachments are for the named person's use only.
This message may contain confidential, proprietary or legally privileged
information. No right to confidential or privileged treatment
of this message is waived or lost by an error in transmission.
If you have received this message in error, please immediately
notify the sender by e-mail, delete the message, any attachments and all
copies from your system and destroy any hard copies. You must
not, directly or indirectly, use, disclose, distribute,
print or copy any part of this message or any attachments if you are not
the intended recipient.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading in an XLS (really XML) file from website

2015-02-27 Thread Raghuraman Ramachandran
This works:
Change the destination directory to suit you.

MyURL1 = http://www.ishares.com/us/239726/fund-download.dl;
download.file(MyURL1,paste(C:/Data/Rtest1,date1,r.xls,sep=),method=wget,quiet=TRUE,mode=wb,
 extra=--header=\User-Agent: Mozilla/5.0
(X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0\)

Cheers
Raghu


On Fri, Feb 27, 2015 at 4:01 PM, Bos, Roger roger@rothschild.com wrote:
 All,

 I am trying to read the SP 500 constituents from the iShares website using 
 the following code:

URL - http://www.ishares.com/us/239726/fund-download.dl;
setInternet2(TRUE)
download.file(url=URL, destfile=temp.xls)
out - readWorksheetFromFile(file=temp.xls, sheet=Holdings, 
 header=TRUE, startRow=13)

 R returns the following error:

out - readWorksheetFromFile(file=temp.xls, sheet=Holdings, 
 header=TRUE, startRow=13)
 Error: IllegalArgumentException (Java): Your InputStream was neither an OLE2 
 stream, nor an OOXML stream
 In addition: Warning message:
 In download.file(url = URL, destfile = temp.xls) :
   downloaded length 1938303 != reported length 200

 Upon further examination this is because the format is really XML.  Is there 
 any way to get XLConnect or any other excel reader to read in an XML file?  I 
 thought XML was for new Excel format.

 Barring that, can we read in the file using the XML package? I tried the 
 following code...

require(XML)
tmp - xmlParse(URL)

 ... but I get this error:

 Opening and ending tag mismatch: Style line 14 and Style
 Error: 1: Opening and ending tag mismatch: Style line 14 and Style

 Thanks in advance for any help or hints,

 Roger



 ***
 This message and any attachments are for the named person's use only.
 This message may contain confidential, proprietary or legally privileged
 information. No right to confidential or privileged treatment
 of this message is waived or lost by an error in transmission.
 If you have received this message in error, please immediately
 notify the sender by e-mail, delete the message, any attachments and all
 copies from your system and destroy any hard copies. You must
 not, directly or indirectly, use, disclose, distribute,
 print or copy any part of this message or any attachments if you are not
 the intended recipient.


 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading in an XLS (really XML) file from website

2015-02-27 Thread John McKown
On Fri, Feb 27, 2015 at 10:01 AM, Bos, Roger roger@rothschild.com
wrote:

 All,

 I am trying to read the SP 500 constituents from the iShares website
 using the following code:

URL - http://www.ishares.com/us/239726/fund-download.dl;
setInternet2(TRUE)
download.file(url=URL, destfile=temp.xls)
out - readWorksheetFromFile(file=temp.xls, sheet=Holdings,
 header=TRUE, startRow=13)

 R returns the following error:

 out - readWorksheetFromFile(file=temp.xls, sheet=Holdings,
 header=TRUE, startRow=13)
 Error: IllegalArgumentException (Java): Your InputStream was neither an
 OLE2 stream, nor an OOXML stream
 In addition: Warning message:
 In download.file(url = URL, destfile = temp.xls) :
   downloaded length 1938303 != reported length 200

 Upon further examination this is because the format is really XML.  Is
 there any way to get XLConnect or any other excel reader to read in an XML
 file?  I thought XML was for new Excel format.

 Barring that, can we read in the file using the XML package? I tried the
 following code...

require(XML)
tmp - xmlParse(URL)

 ... but I get this error:

 Opening and ending tag mismatch: Style line 14 and Style
 Error: 1: Opening and ending tag mismatch: Style line 14 and Style

 Thanks in advance for any help or hints,

 Roger


​The problem is indeed on line 14 of the file. The contents of that line
are:

/style

but should be

/ss:style

That is, the file is malformed. I edited the file to make that change and
saved it. After I did this, I was able to open it as a spreadsheet using
LibreOffice. I did all of this on my home Linux system. I don't have
Windows, and thus no Excel either, available here, so I can't test with
Excel. ​You should be able to download this file as shown by Raghuraman. On
Windows (which I _assume_ you are using since most do), you can edit the
file using Notepad, or Wordpad. I would use Wordpad myself. Notepad is
iffy on some things. Save it back, then try readWorksheetFromFile() as
you originally did.


-- 
He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! 
John McKown

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.