On Tue, May 22, 2007, Jörn Zaefferer wrote:

>  Dan G. Switzer, II wrote:
> >> This is a little off-topic, but when doing a regex search and replace
> >> within a text editor, how can I replace one character within a
> >> specific pattern?
> >>
> >> I want to get rid of newlines within <td> tags.  This finds them:
> >> <td>[^<]+(\r\n).+</td>
> >>
> >> How do I specify that I only want to replace the matched set?
> >>
> >
> > You group all the contents and then the replacement string are all the
> > matched sets pieced back together:
> >
> > sHtml.replace(/(<td>[^<]+)(\r\n)(.+</td>)/gi, "$1$3")
> >
>  If I got that right, you could even mark the second group to be skipped by
>  adding a colon:
>
>  sHtml.replace(/(<td>[^<]+)(:\r\n)(.+</td>)/gi, "$1$2")

The syntax requires a question mark: (?:...)

>  Or just skip the parentheses?
>
>  sHtml.replace(/(<td>[^<]+)\r\n(.+</td>)/gi, "$1$2")

Yes, but this IMHO is still too weak because...

1. the ".+" in this regex is greedy and matches too much and this
   way you would only remove newlines from every _second_ <td>...</td>
   construct. So one has to use at least .+? to fix this.

2. Additionally, I recommend to use \r?\n to support both the Windows
   CR-LF and Unix LF-only field.

3. The [^<]+ I do not understand as it would NOT allow to remove the
   newlines when there is additional markup in the <td> container as in
   "<td>...\n...<span>...</span>...</td>". I recommend to replace it
   with just ".*?".

4. The "+" qualifier should be actually "*" as it might be fully valid
   to have a "<td>\r\n</td>" container ;-)

5. The </td> has to be written escaped as in <\/td> within the regex
   construct.

6. As the "." regex character in JavaScript does NOT match newline
   character one has to use "(.|\r?\n)*".

So, I recommend the following stronger version:

sHtml.replace(/(<td>.*?)\r?\n((?:.|\r?\n)*?<\/td>)/gi, "$1$2")

But even this still has the problem that it is unable to remove MULTIPLE
occurences of newlines in the SAME <td> container. If this should be
also allowed one has to trick a little bit more:

sHtml = sHtml.replace(
    /(<td>)(.*\r?\n(?:.|\r?\n)*)(<\/td>)/gi,
    function ($0, $1, $2, $3) {
        return $1 + $2.replace(/\r?\n/g, "") + $3;
    }
);

This now should be a strong enough version and finally
do what was requested...

                                       Ralf S. Engelschall
                                       [EMAIL PROTECTED]
                                       www.engelschall.com

Reply via email to