RE: [htdig] indexing

Geoff Hutchison Tue, 11 Jan 2000 21:41:09 -0800
At 11:43 AM -0800 1/7/00, David Schwartz wrote:
>       If it really is the URLs eating memory, perhaps we need a 
>patch to allow
>the URLs to be swept to be stored in a different way (perhaps each depth
>should write the URLs for the next greater 'depth' into a file?). It'd be
>very convenient for me to be able to dig 400,000 URLs in a pass.

Yes, but I'm pretty confident you'd be upset with the performance. 
Remember that it's not like it can just decide a URL is relatively 
unimportant. It needs to know what URLs are already visited as well 
as those already in the queue. So if it writes out part of the URL 
list to disk, it'll have to check the disk file for every new link it 
comes across.

If someone has a great idea for getting around this, I'm all ears.

>       If it's not the URLs, what is it?

Hey, you're taking my question! ;-)

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
RE: [htdig] indexing

Reply via email to