Re: problem with GAWK/gsub when substitute new lines

Rita Shen Thu, 18 Jun 2009 17:29:22 -0700

Hi, Ralf,

Thanks for your reply.


I tried the command:

awk 'BEGIN { RS="\&"};{
    gsub(/\[[\n]+\]/, "");print $0}' gawk_test > gawk_modi

But the content in gawk_modi is still the same as the gawk_test:

<"Reports" [



]>


By the way, what's the difference between RS and FS?

Thanks for y our help,
Rita


On Thu, Jun 18, 2009 at 3:49 PM, Ralf Wildenhues <ralf.wildenh...@gmx.de>wrote:

> Hello Rita,
>
> * shaledova wrote on Thu, Jun 18, 2009 at 05:26:44AM CEST:
> >
> > I tried to use gawk to perform some text conversions. But I could not
> > substitute new lines (\n) using gsub such as:
> > gsub(/\[[\n]*\]/, "");
> >
> > For example, if I have a file containing:
> > <"Week Report" [
> >
> >
> >
> > ]>
> >
> > I want to convert these lines to:
> > <"Week Report">
> >
> > What is wrong with the expression?
>
> The expression is ok, but gawk operates on each line in turn by default;
> more specifically, the implicit loop is over records, with RS being the
> record separator, which is a newline by default.  With something like
>  awk 'BEGIN { RS="X" }
>       { gsub(/\[[\n]*\]/, ""); print }'
>
> you can get the above input to turn into
>  <"Week Report" >
>
> (note also the space before the closing > that was noto matched).
>
> Of course, this is a kludge and requires your input to not contain X;
> and you might have to adjust the output record separator ORS as well.
>
> However, when parsing nested structures, regular expressions are
> generally not the right tool.  You might be better off writing a small
> state machine that reads the file line by line and just skips printing
> output when inside unwanted [ ] brackets.
>
> Hope that helps.
>
> Cheers,
> Ralf
>

Re: problem with GAWK/gsub when substitute new lines

Reply via email to