Re: use comm command with regular expression

Bob Proulx Sun, 17 Jun 2012 17:37:39 -0700

e-letter wrote:
> Bob Proulx wrote:
> > Sounds like a homework assignment.
> 
> Well, my own assignment in my own home (managing files)!


Okay then.  You can imagine that a lot of students try to short
circuit things!

> > They can't be.  It doesn't make sense for comm.  Perhaps you should be
> > using 'grep' or 'sed'?
> 
> Had a brief look and the manual for 'bash' which makes reference to
> grep, but so far awk and perl and supposed to be options. Too many
> choices!

Perhaps if you describe your problem in more detail we on the mailing
list could help better?  So far you have said:

> File 1 contains data:
> /some/text/abcd.xyz
> 
> File 2:
> abcd.xyz

The first contains fully qualified paths to files and the second one
contains only basenames.  Okay.

> The task is to be able to compare file1 to file2 using a regular
> expression as a criterion for comparison, such as:
> 
> *cd.xyz

That looks like a file glob.  It is called globbing because the '*'
matches a glob of characters.  You can get more documentation on that
style of pattern with 'man 7 glob'.  Most importantly globbing is not
a regular expression.  They have different syntaxes from each other.
The command line uses file globbing.  This would match the file in the
current directory from the command line.

  $ ls *cd.xyz

> Then create a new file 'file3' that contains only those lines that
> satisfy the regular expression, but must contain the same format style
> as in file1.

I suggested this:

  $ grep -F cd.xyz file1 > file3

In this above I suggested -F to turn off regular expressions and to
use the string literally.  Otherwise the '.' matches any character and
should be escaped.  Escaping the dot would be "cd\.xyz".  However that
may be difficult if you have a file of partial filenames.  You would
need to preprocess it first.  Easier if you can take them as literal
strings.

Or perhaps:

  $ grep -F -f file2 file1 > file3

Wasn't sure what you were asking for.  In the above it uses the file2
as a list of strings to select from file1.  The -f takes a file
argument.  They will be full regular expressions unless -F is given
too.  It seems like that will give you exactly what you want.

> Had a brief look and the manual for 'bash' which makes reference to
> grep, but so far awk and perl and supposed to be options. Too many
> choices!

Bash is a command line shell that also has features that make it good
for use in controling other programs.  It is an expansion from the
/bin/sh shell.  But it really only has simple strings for data
structures.  For simple things it is great.  For more complex things
then languages like Perl (which includes Ruby, Python and Awk) are
better since they have complex data structures and can do virtually
anything you can think of doing without language limitations.

Awk is the original little scripting language from way back and it is
therefore the most standard and portable.  If I am doing something
relatively simple then I use awk since it can be written to run
everywhere.  But it does have some quirks that make it difficult to
use for larger programs.

Perl then also appeared on the scene some time later.  It combined a
lot of syntax from the shell, sed, grep, awk, and other utilities.
It's popularity has probably peaked.  It was more popular a few years
ago.  The syntax isn't clean and it has a lot of special cases.  This
has caused many people to like the even newer languages of Python and
Ruby which have more recently appeared on the scene.

How to choose?  If 'grep -F -f file2 file1 > file3' does what you want
then I would definitely use the shell and stop there.  If not then I
would move on to Perl.  Or others would suggest either Python or Ruby.

Bob

Re: use comm command with regular expression

Reply via email to