At 09:42 21-6-2001 -0700, Jon wrote:
>Edwin,
>
>on 6/21/01 7:16 AM, "Edwin Martin" <[EMAIL PROTECTED]> wrote:
>
> > org.apache.regexp 1.2 is pretty much broken. It has some
> > major flaws since 1.0 and they are still not addressed.
> >
> > See http://nagoya.betaversion.org/bugzilla/buglist.cgi?product=Regexp
> > for a list of bugs (BTW none of them is assigned).
>
>Sending in bug reports doesn't get the problems fixed. This is a community
>of VOLUNTEERS. You can't just magically put in a bug report and then someone
>is going to jump up and fix it...you have to submit patches or try to nicely
>motivate people to fix it for you.
>
><http://jakarta.apache.org/site/understandingopensource.html>
>
>"With the opensource system, if you find any deficiency in the project, the
>onus is on you to redress that deficiency."

I thought submitting bug reports is also an important
way to support Open Source.

Well, I looked at the regexp-code and saw one of the bugs:

RECompiler.java, line 664:

                    // Premature end of range. define up to Character.MAX_VALUE
                     if ((idx + 1) < len && pattern.charAt(++idx) == ']')
                     {
                         simpleChar = Character.MAX_VALUE;
                         break;
                     }

The code makes any minus a range.

The RE "[a-]" becomes "the character a and anything after it".

A minus at the beginning or the end should be just a minus.

The code should be something like this:

                     // Premature end of range. define up to 
Character.MAX_VALUE
                     if ((idx + 1) < len && pattern.charAt(++idx) == ']')
                     {
                         definingRange = false;
                         break;
                     }

Futhermore, RECompiler.java, line 697:

                 if ((idx + 1) >= len || pattern.charAt(idx + 1) != '-')

Should become something like:

                 if ((idx + 1) >= len || !(pattern.charAt(idx + 1) == '-' 
&& !((idx + 2) <= len && pattern.charAt(idx + 2) == ']')))

Which means: Do not include a char when followed by a minus, but DO include the
char when the minus is followed by a ']'.

The code still does not address the possibility of a charclass which starts 
with a
minus, like "[-a]" or "[^-a]", but that shouldn't be too difficult to 
implement.

It isn't really that hard to fix these bugs, I just wonder if there's anybody
responsible for the regexp package.

And by the way, you don't have to shout.

Bye,
Edwin Martin.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to