Try throwing a dollar sign ($) at the end of your expression to indicate that
it's the end of the string.

For example:

http://10.47.23.110:85/firm-info/bios/$

Would block just that URL but allow
http://10.47.23.110:85/firm-info/bios/2904/somethingelse.aspx

You could also play around with a regex like this to block anything not
ending in ASPX:

http://10\.47\.23\.110:85/firm-info/(.*?)\.(ASPX|aspx)$

The (.*?) will match all characters in a non-greedy fashion but only match
up to a .ASPX or .aspx, at which point the .ASPX must be the end of the URL.
This means that things like default.aspx?name=value would fail the match.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Skipping-certain-URLs-tp2424735p2997664.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.

Reply via email to