karsten harazim wrote:
wonder if it seems to be possible to extract information from existing websites into some exel document like extracting all names, adresses, phone numbers, email, url etc from pages like that: http://www.muenster.de/schulen-alle-1.html
Technically, you need something like XSLT to do this, although you are rather dependent on the author actually writing HTML according to true spirit of HTML, which is rather rare. You may need to convert the HTML to XML syntax, before using XSLT.
For the actual download, you would be better using one of the specialist tools, like curl or wget.
However, actually doing so is likely to be illegal. Even if you the information is a pure collection of facts, in countries like the UK, the would be covered by a database copyright. At least one reason why Lynx can get blocked form sites is that it is often used to extract information without the surrounding advertising/branding.
-- David Woolley Emails are not formal business letters, whatever businesses may want. RFC1855 says there should be an address here, but, in a world of spam, that is no longer good advice, as archive address hiding may not work. _______________________________________________ Lynx-dev mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/lynx-dev
