Re: strategies for discovering URLs

Dominic Mitchell Wed, 10 Jul 2002 05:42:05 -0700

Keary Suska wrote:
> on 7/9/02 7:18 AM, [EMAIL PROTECTED] purportedly said:
> 
> 
>>Hello everyone,
>>
>>Probably a bit off-topic, but ...  I was wondering if anyone could share
>>their strategies for discovering URLs from webpages.
>>
>>In the past I've used combinations of the truss command and Netscape
>>(Solaris) to ferret out URLs from webpages that don't want to share them,
>>which works pretty well, but is OS/browser dependent.  Any other good
>>techniques (Perl/libwww?) out there?
>
> I am not sure what you are asking here. There is no way to hide URLs from
> LWP that are not hidden from any web browser. Certainly, client-side
> scripting such as JavaScript can make gathering URLs a bit more difficult,
> but if you know JavaScript, it can be figured out without much trouble
> (usually). Trussing Netscape seems like more work than should be necessary.


Even easier than that is to set up a proxy server and point your browser 
at that.  Then you'll be able to log anything that passes through it.  A 
quick google shows up:

http://www.stonehenge.com/merlyn/WebTechniques/col11.html
http://www.stonehenge.com/merlyn/WebTechniques/col34.html

Which discuss making proxy servers in perl.  With a few slight 
adjustments, they could probably serve your purpose very well by logging 
everything that passes through them.

-Dom

Re: strategies for discovering URLs

Reply via email to