Dirk:
--------------------------------------------------------------------------------
Hi,
Many thanks.
Better having some false positives and catching all links, URLs than missing one
of them without false positives.
But, I have any idea of those expressions, unfortunatly.
OK, extracting this now (I added ":" to avoid the source code is not be shown):
http:://well.me/dfdfddddf 200 ok text/html
1 1 1 nginx 00:00.799 utf-8
http:://well.me/999 200 ok text/html
2 1 1 nginx 00:00.285 utf-8
http:://well.me/456 200 ok text/html
2 1 2 nginx 00:00.323 utf-8
http:://well.me/8887kku 200 ok text/html
2 1 1 nginx 00:00.311 utf-8
extracts that:
http:://well.me/dfdfddddf
00.799
http:://well.me/999
00.285
http:://well.me/456
00.323
http:://well.me/8887kku
00.311
May be one could change that.
Many thanks again.
--------------------------------------------------------------------------------
Hi,
well, these matched numbers are the mentioned false positives ... :-)
you may try the following modified pattern
cite:
--------------------------------------------------------------------------------
((news|http|ftp|https):\/\/)?[\w\-_]+(\.[\w]+)*?(\.[a-z]{2,3})([\w\-\.,@?
^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?--------------------------------------------------------------------------------
This should ensure the presence of a toplevel domain consisting of 2-3 letters;
again make sure tpo test it on you data, I only did it in a very limited way.
Alternatively, if you know the form of the urls you want to match, it might be
workable to write a simpler pattern from scratch - a large part of this version
seems to deal with the query part after ?.
hth,
vbr
--
<http://forum.pspad.com/read.php?2,62001,62022>
PSPad freeware editor http://www.pspad.com