Re: [CentOS] OT: grep regex pointer appreciated

2011-03-07 Thread Robert Grasso
Hello,

On my opinion, grep is not powerful enough in order to achieve what you want. 
It would be preferable to use at least some (old but
powerful) tools such sed, awk, or even better : perl. Actually, what you need 
is a tool providing a capture buffer (this is perl
jargon - back references in sed jargon) in which you can get the string you 
want to extract, rather than trying to build up a
positive matching regex, as the string boundaries seem to be easy enough to 
describe with regexs.

Regards

---
Robert GRASSO – System engineer

CEDRAT S.A.
15 Chemin de Malacher - Inovallée - 38246 MEYLAN cedex - FRANCE 
Phone: +33 (0)4 76 90 50 45 - Fax: +33 (0)4 56 38 08 30
mailto:robert.gra...@cedrat.com - http://www.cedrat.com  

 -Message d'origine-
 De : centos-boun...@centos.org 
 [mailto:centos-boun...@centos.org] De la part de Patrick Lists
 Envoyé : 5 mars 2011 23:14
 À : CentOS mailing list
 Objet : [CentOS] OT: grep regex pointer appreciated
 
 Hi,
 
 My grep regex foo is not very good and googling is getting me 
 nowhere so 
 hopefully someone is kind enough to give me some pointers.
 
 Goal: grep (non .dbg) filenames and versions from a ftp dir 
 listing and 
 a raw html file:
 
 $ wget --no-remove-listing -O ftp-index.txt ftp://127.0.0.1/test/
 $ wget --no-remove-listing -O index.html http://127.0.0.1/test/
 
 The relevant parts of the files above (first one is ftp 
 listing, second 
 part is the html file, both copied to test_regex.txt) are:
 
 2011 Jan 28 21:25  File  a 
 href=ftp://127.0.0.1/bar-4.5.6.i686.dbg.tgz;bar-4.5.6.i686.d
 bg.tgz/a 
   (5551274 bytes)
 2011 Jan 28 21:25  File  a 
 href=ftp://127.0.0.1/bar-4.5.6.i686.tgz;bar-4.5.6.i686.tgz/a 
 (5551274 bytes)
 2011 Jan 28 21:25  File  a 
 href=ftp://127.0.0.1/bar-4.5.6.x86_64.dbg.tgz;bar-4.5.6.x86_
 64.dbg.tgz/a 
   (5551274 bytes)
 2011 Jan 28 21:25  File  a 
 href=ftp://127.0.0.1/bar-4.5.6.x86_64.tgz;bar-4.5.6.x86_64.tgz/a 
 (5551274 bytes)
 
 trtda 
 href=foo-bar-1.2.3+1.2.3.tar.gzfoo-bar-1.2.3+1.2.3.tar.gz/td/tr
 
 This is what I now have (improvements most welcome):
 
 $ egrep -o 
 ([A-Za-z_-]+)([[:digit:]]{1,3}(\.[[:digit:]]{1,3})*).+(.|t)gz 
 ./test_regex.txt | grep -v .dbg | tr -d ''
 
 Output:
 
 foo-bar-1.2.3+1.2.3.tar.gz
 baz-4.5.6.i686.tgz
 baz-4.5.6.x86_64.tgz
 
 So far so good but now I also want to get the version numbers which I 
 can't figure out. Anyone have a pointer how to get the version number 
 from these filenames (1.2.3+1.2.3 and 4.5.6)?
 
 Thanks!
 Patrick
 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos
 

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: grep regex pointer appreciated

2011-03-07 Thread Patrick Lists
On 03/07/2011 12:23 PM, Robert Grasso wrote:
 Hello,

 On my opinion, grep is not powerful enough in order to achieve what you want. 
 It would be preferable to use at least some (old but
 powerful) tools such sed, awk, or even better : perl. Actually, what you need 
 is a tool providing a capture buffer (this is perl
 jargon - back references in sed jargon) in which you can get the string you 
 want to extract, rather than trying to build up a
 positive matching regex, as the string boundaries seem to be easy enough to 
 describe with regexs.

Thank you for your advice. After much fiddling I came up with something 
that seems to work. I have never dabbled with perl but will dig up my 
sed/awk book and see if there's a more elegant way to do this.

Regards,
Patrick
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: grep regex pointer appreciated

2011-03-07 Thread Bill Campbell
On Mon, Mar 07, 2011, Robert Grasso wrote:
Hello,

On my opinion, grep is not powerful enough in order to achieve what you
want. It would be preferable to use at least some (old but powerful) tools
such sed, awk, or even better : perl. Actually, what you need is a tool
providing a capture buffer (this is perl jargon - back references in sed
jargon) in which you can get the string you want to extract, rather than
trying to build up a positive matching regex, as the string boundaries seem
to be easy enough to describe with regexs.

One can use pcregrep which is grep that groks perl regular
expressions.

Bill
-- 
INTERNET:   b...@celestial.com  Bill Campbell; Celestial Software LLC
URL: http://www.celestial.com/  PO Box 820; 6641 E. Mercer Way
Voice:  (206) 236-1676  Mercer Island, WA 98040-0820
Fax:(206) 232-9186  Skype: jwccsllc (206) 855-5792

If the government can take a man's money without his consent, there is no
limit to the additional tyranny it may practise upon him; for, with his
money, it can hire soldiers to stand over him, keep him in subjection,
plunder him at discretion, and kill him if he resists.
Lysander Spooner, 1852
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] OT: grep regex pointer appreciated

2011-03-05 Thread Patrick Lists
Hi,

My grep regex foo is not very good and googling is getting me nowhere so 
hopefully someone is kind enough to give me some pointers.

Goal: grep (non .dbg) filenames and versions from a ftp dir listing and 
a raw html file:

$ wget --no-remove-listing -O ftp-index.txt ftp://127.0.0.1/test/
$ wget --no-remove-listing -O index.html http://127.0.0.1/test/

The relevant parts of the files above (first one is ftp listing, second 
part is the html file, both copied to test_regex.txt) are:

2011 Jan 28 21:25  File  a 
href=ftp://127.0.0.1/bar-4.5.6.i686.dbg.tgz;bar-4.5.6.i686.dbg.tgz/a 
  (5551274 bytes)
2011 Jan 28 21:25  File  a 
href=ftp://127.0.0.1/bar-4.5.6.i686.tgz;bar-4.5.6.i686.tgz/a 
(5551274 bytes)
2011 Jan 28 21:25  File  a 
href=ftp://127.0.0.1/bar-4.5.6.x86_64.dbg.tgz;bar-4.5.6.x86_64.dbg.tgz/a 
  (5551274 bytes)
2011 Jan 28 21:25  File  a 
href=ftp://127.0.0.1/bar-4.5.6.x86_64.tgz;bar-4.5.6.x86_64.tgz/a 
(5551274 bytes)

trtda 
href=foo-bar-1.2.3+1.2.3.tar.gzfoo-bar-1.2.3+1.2.3.tar.gz/td/tr

This is what I now have (improvements most welcome):

$ egrep -o 
([A-Za-z_-]+)([[:digit:]]{1,3}(\.[[:digit:]]{1,3})*).+(.|t)gz 
./test_regex.txt | grep -v .dbg | tr -d ''

Output:

foo-bar-1.2.3+1.2.3.tar.gz
baz-4.5.6.i686.tgz
baz-4.5.6.x86_64.tgz

So far so good but now I also want to get the version numbers which I 
can't figure out. Anyone have a pointer how to get the version number 
from these filenames (1.2.3+1.2.3 and 4.5.6)?

Thanks!
Patrick
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: grep regex pointer appreciated

2011-03-05 Thread Nico Kadel-Garcia
On Sat, Mar 5, 2011 at 5:13 PM, Patrick Lists
centos-l...@puzzled.xs4all.nl wrote:
 Hi,

 My grep regex foo is not very good and googling is getting me nowhere so
 hopefully someone is kind enough to give me some pointers.

 Goal: grep (non .dbg) filenames and versions from a ftp dir listing and
 a raw html file:

 $ wget --no-remove-listing -O ftp-index.txt ftp://127.0.0.1/test/
 $ wget --no-remove-listing -O index.html http://127.0.0.1/test/

 The relevant parts of the files above (first one is ftp listing, second
 part is the html file, both copied to test_regex.txt) are:

 2011 Jan 28 21:25  File  a
 href=ftp://127.0.0.1/bar-4.5.6.i686.dbg.tgz;bar-4.5.6.i686.dbg.tgz/a
  (5551274 bytes)
 2011 Jan 28 21:25  File  a
 href=ftp://127.0.0.1/bar-4.5.6.i686.tgz;bar-4.5.6.i686.tgz/a
 (5551274 bytes)
 2011 Jan 28 21:25  File  a
 href=ftp://127.0.0.1/bar-4.5.6.x86_64.dbg.tgz;bar-4.5.6.x86_64.dbg.tgz/a
  (5551274 bytes)
 2011 Jan 28 21:25  File  a
 href=ftp://127.0.0.1/bar-4.5.6.x86_64.tgz;bar-4.5.6.x86_64.tgz/a
 (5551274 bytes)

 trtda
 href=foo-bar-1.2.3+1.2.3.tar.gzfoo-bar-1.2.3+1.2.3.tar.gz/td/tr

 This is what I now have (improvements most welcome):

 $ egrep -o
 ([A-Za-z_-]+)([[:digit:]]{1,3}(\.[[:digit:]]{1,3})*).+(.|t)gz
 ./test_regex.txt | grep -v .dbg | tr -d ''

 Output:

 foo-bar-1.2.3+1.2.3.tar.gz
 baz-4.5.6.i686.tgz
 baz-4.5.6.x86_64.tgz

 So far so good but now I also want to get the version numbers which I
 can't figure out. Anyone have a pointer how to get the version number
 from these filenames (1.2.3+1.2.3 and 4.5.6)?

Separate the .i686.tgz with something like a '-' or _, not a dot.
and be consistent about using .tar.gz instead of mixing .tar.gz and
.tgz, if possible.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos