Ok.  I got the patch and recompiled.  Unfortunately, I am stinking a regex's.  
I started checking out some online docs for awk style regex help.   Lots of
places.  I tried almost 50 different ypes of regex's, none of them has come
close to matching what I want to do.  

To start with, I am not even looking to do the complicated search/replace.  I
am just trying to follow the example included with the documentation, mutated
for my needs.  Here is what I have:

url_rewrite_rules:      (.*)\files/[0-9*]/.* \\1

hopefully, that will find a string somewhere in the line that has 'files/'
followed by 'any number of numbers', followed by a '/', followed by
anything(number or character).  then replace it with an empty string (thus
eliminating everything after the final '/' in the search string).
this does nothing to my searches.  My results are still
"http://mydomain.com/files/0001/0300/0002/document.doc", etc.

Next, I tried this as a variation of the above:

url_rewrite_rules:      (.*)\files/[0-9*].*  \\1

This wont even search anything past the first all-number directory after the
'files' directoy.
I have tried the full url, 'http://etc,etc,etc', I have tried going 3
directories down, (\files/[0-9]*/[0-9]*/[0-9]*/.*), and many other
combinations.   None of them have actually rewritten a url.   The closest I
have come is denying myself the ability to even index the full directory
structure.


Can someone please help me understand this thing?

Thanks again,
David


On Mon, 19 Mar 2001 18:40:07 -0500 (EST), Geoff Hutchison said:

> On 19 Mar 2001, David Patterson wrote:
>  
>  > it.  Basically, I need a *specific* part of a url rewritten.  The problem is
>  > that the part that needs to be rewritten is always different.  Ill give an
>  > example:
>  > 
>  > From:
>  > http://mydomain.com/files/0001/0300/0002/document.doc
>  > 
>  > To:
>  > http://mydomain.com/server-java/servlet&DOCUMENTID=0001
>  
>  This depends. What it seems like you want is more flexible regex-based
>  rewriting. But in this case, the rewriting is done *permanently*, whereas
>  url_part_aliases does this in a fashion that can be done
>  "temporarily" just for searching.
>  
>  The patch for URL regex rewriting by Andy Armstrong is at:
>  <ftp://ftp.ccsf.org/htdig-patches/3.1.5/htdig-3.1.5.aarmstrong.README>
>  <ftp://ftp.ccsf.org/htdig-patches/3.1.5/htdig-3.1.5.aarmstrong.tar.gz>
>  
>  --
>  -Geoff Hutchison
>  Williams Students Online
>  http://wso.williams.edu/
>  

-- 
David Patterson
Unix/Linux Administrator
TIMPO/HIRS
2421 Dickman Road, Ste 84
(Bldg 1001, Reid Hall)
Ft Sam Houston, TX 78234-5084
(210) 295-2575



_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to