Author: duncan Email: [EMAIL PROTECTED] Message: Hi- I seached the list, so forgive me if this has been covered. I am struggling to get my results to be what i want... I want to spider thru X sites, and grab _only_ .tgz files... I want every resulting search result to point at a .tgz file. I seem to be close, but it only gives the "index fox /some/path/to/tgz/" as the link. My search will be accrost ftp and http servers, and i have used varying combinations of the following rules: #Allow Match .tgz # Exclude Apache directory list in different sort order using "string" match: Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D # More complicated case. RAR .r00-.r99, ARJ a00-a99 files # and unix shared libraries. We use "Regex" match type here: Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$ CheckOnly *.tgz #CheckOnly [^/]$ #HrefOnly Match NoCase \/$|\.html$|\.shtml$|\.phtml$|\.php$|\.txt$|\.htm$|\.tgz$ HrefOnly Match NoCase \/$|\.html$|\.shtml$|\.phtml$|\.php$|\.txt$|\.htm$ Allow Match .tgz /* #Disallow * UrlWeight 30 UrlFileWeight 30 TIA duncan Reply: <http://search.mnogo.ru/board/message.php?id=1950> ___________________________________________ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
