According to Bill Giese:
> Sorry for yet another javascript indexing inquiry - yes, I've read the FAQ, 
> and have looked into the options available.
> 
> I noticed the new url_rewrite_rules option, and am wondering if it might 
> solve my particular javascript indexing problem.
> 
> I'm trying to index an MSN Groups site, and almost have it working. There 
> are certain javascript links that need to be crawled, but the actual URL's 
> could be easily identified with a regex. Here's an example link:
> 
> <a class=Command href="JavaScript:navAway('RelativePathToURL');" >
> 
> The actual link is nothing more than the parameter passed to the 
> javascript. Can I use the url_rewrite_rules option to strip away the 
> javascript call, leaving nothing but the actual relative URL? If so, could 
> some kind soul offer up the proper regex for the above? My skills in this 
> area, well, suck...

I can say right away that this trick won't work in 3.1.6.  In that version,
htdig only allows http:// URLs, and it does this validity checking before
any URL rewriting.

On the other hand, the 3.2.0b4 code seems to do URL rewriting earlier on,
so it just might work there.  Try something like...

url_rewrite_rules: JavaScript:navAway('\\(.*\\)');  http://mysite/\\1

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone?  Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to