On Tue, 2 Aug 2005, Jari Tuominen wrote:

Hi

I am programming a Web crawler and an indexer.

I am implementing Lynx in converting HTML documents into text files, by using command "lynx -dump".

The problem is that it converts relative URLs to FILE:///db/www/... -stylish.

yes - because the document you pointed it at is that type of URL.
True, lynx is interpreting the URLs, but it's not changing their type.

I am using Lynx in extracting links out of the HTML files, so I need to play around alot to convert those local URLs back to relative ones, which I can combine to the host name, therefore creating an absolute www- URL.

If you know any other program than Lynx which does these similar tasks at same performance, I would be interested to know, thanks...

I'm not sure if you'll find one (sorry).

--
Thomas E. Dickey
http://invisible-island.net
ftp://invisible-island.net


_______________________________________________
Lynx-dev mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/lynx-dev

Reply via email to