Christian Seberino wrote:
> How extract the visible numerical data from this Microsoft financial
> web site?
>
> http://tinyurl.com/yw2w4h
>
> If you simply download the HTML file you'll see the data is *not*
> embedded in it but loaded from some other file.
>
> Surely if I can see the data in my browser I can grab it somehow
> in a script?
>
> Any help greatly appreciated.
The data you want is in an inline frame that is filled by a second http
request.
I presume you want to automate it, but to get a feel for it, you can
play around a little with
Right-Click inside the income statement data
choose 'This Frame > Open in new tab'
in the new tab, choose (menu) 'View > Page Style > No Style'
You could always select the text and paste somewhere, but, to automate,
maybe you could do something like the following:
Look at the main page source and the 2nd tab location bar, and you can
see how the http request is constructed from the <IFRAME...src="xxx"
part.
To automate, use some fetch tool (like GET from perl-libwww-perl)
or urlgrabber if you want a python tool)
GET the first page, and compute the http address for the iframe
GET the iframe
links -dump works nicely to extract the text
- - -
Financial data in U.S. Dollars
Values in Millions (Except for per
share items)
2006 2005 2004
2003 2002
Period End Date 02/25/2006 02/26/2005 02/28/2004
03/01/2003 03/02/2002
..snip..
Diluted Normalized EPS 1.92 1.65 1.31
1.0 0.74
- - -
Regards,
..jim
--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list