Christian Seberino wrote:
> How extract the visible numerical data from this Microsoft financial
> web site?
> 
> http://tinyurl.com/yw2w4h
> 
> If you simply download the HTML file you'll see the data is *not*
> embedded in it but loaded from some other file.
> 
> Surely if I can see the data in my browser I can grab it somehow
> in a script?
> 
> Any help greatly appreciated.

The data you want is in an inline frame that is filled by a second http
request.

I presume you want to automate it, but to get a feel for it, you can
play around a little with
  Right-Click inside the income statement data
  choose 'This Frame > Open in new tab'
  in the new tab, choose (menu) 'View > Page Style > No Style'

You could always select the text and paste somewhere, but, to automate,
maybe you could do something like the following:

Look at the main page source and the 2nd tab location bar, and you can
see how the http request is constructed from the <IFRAME...src="xxx"
part.

To automate, use some fetch tool (like GET from perl-libwww-perl)
  or urlgrabber if you want a python tool)

  GET the first page, and compute the http address for the iframe
  GET the iframe
  links -dump works nicely to extract the text

- - -
                                    Financial data in U.S. Dollars
                                    Values in Millions (Except for per
share items)

                                     2006       2005       2004
2003       2002
Period End Date                02/25/2006 02/26/2005 02/28/2004
03/01/2003 03/02/2002
..snip..
Diluted Normalized EPS               1.92       1.65       1.31
1.0       0.74
- - -


Regards,
..jim


-- 
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Reply via email to