On Tue, 13 Apr 2004, [iso-8859-1] Tinni wrote: > For url indexing, i found there are lots of urls which are different from our site. > In your explanation below - it is the point one you mentioned... > > I have used 'start_url' though, but still it is spidering different urls. Now i > used the following parameters.
What do your start_url and limit_urls_to attributes look like? This is the most important part with regard to keeping the dig within the intended sites. > common_url_parts: http://www.example.com/ This attribute just provides a way to reduce the amount of space used for common strings in the database. It doesn't affect which URLs are indexed. > local_urls: http://www.example.com/ This attribute just request that htdig grab the files directly from the local file system rather than going through the web server. Again, it doesn't affect which URLs are indexed. > local_urls_only: true This attribute says that only files available through the local filesystem are to be indexed. It might very well be limiting the URLs being indexed, but in a round about way. Perhaps even in an incorrect way depending on exactly what you are trying to accomplish. > I am merging all the files now.. Thre are 20 sites i need to merge. While i was > creating the individual database, i found one file (huge volume) is being created as > named "core". What is the file for? I have deleted the file, it seems it is a > binary.. In most cases finding a big file named core is a bad thing. It means that some program is crashing. The core file contains a lot of information about the program and the state it was in when it crashed. Running the command 'file core' might provide some insight into which programs is crashing. Jim ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

