I need an algoritm which takes two regular expressions and produces the
third expression which solutions are intersection of previous two
solution, so that
...
Can the ORO do this?
This is not a feature of jakarta-oro.
daniel
I also noticed the same tools at http://www.savarese.org/oro/downloads/
This is the old software on which jakarta-oro is based and is no
longer supported.
The PerlTools have text/regex/Perl5StreamInput, and Jakarta source does not.
Are there any other significant differences?
See the CHANGES
The expression:
new Perl5Util().match(/[[:alpha:]]/, a)
returns false with ORO 2.0.3, whereas:
new Perl5Util().match(/[[:alnum:]]/, a)
returns true. I hope this isn't another newbie mistake...
No. You found a bug. Perl5Matcher doesn't have any code in there for
handling
In message [EMAIL PROTECTED], Andrea Palmieri writes:
I am new to Oro. I noticed that Perl5StreamInput and methods
manipulating Perl5StreamInput have been removed in the version 2.0. Is
there any other way to seach text from a file?
Read in the entire file and do the search in memory. This is
jakarta-oro 2.0.4 is ready for download from
http://jakarta.apache.org/builds/jakarta-oro/release/v2.0.4/
The following URL summarizes the changes made between 2.0.3 and 2.0.4
http://cvs.apache.org/viewcvs/~checkout~/jakarta-oro/CHANGES?content-type=text/
plain
This is a maintenance release,
In message 003d01c11c61$4c6ecda0$[EMAIL PROTECTED], Gabrio Verratti wr
ites:
I'm using your ORO v1.1.0a API and I wanted to know what the difference
between contains() and matches() is. The latest javadoc (@ Jakarta)
suggests that the two are different in certain situations, but I'm not
In message [EMAIL PROTECTED], John Goalby writes:
Input string : HEADERAABBCCDD
Output : AA, BB, CC, DD
I have tried a numbe of things but cannot get it:
HEADER(.{2})*
This ONLY gives me DD.
Any way to get the groups for AA, BB, CC and DD?
Capturing parentheses will only save the last thing
In message 000401c13152$92568eb0$[EMAIL PROTECTED], Joe Pardi writes:
I'm currently using the Regexp package to do it server-side but need the
library that best matches the JavaScript equivalent. Sounds like you
recommend I should use Perl5Util instead?
I only mentioned Perl5Util because
In message 006801c13ca3$4aa9eab0$2e02000a@frogger, Tracy Spiva writes:
..
Here is another simple program that shows what I'm seeing.
Well, it works for me. My best guess is that it either has something
to do with your locale settings or is a platform-dependent JVM bug.
I ran your program using
In message [EMAIL PROTECTED], Ranjeet G
anguli writes:
I have to use a lookbebind pattern like (?!foo)bar i.e., match 'bar' not
preceded by 'foo'. I understand that Perl5 does not allow that but Perl8
does (please correct me if I am wrong !) . Can anyone please let me know if
there is a
In message [EMAIL PROTECTED], writes:
Could you clarify the situation with a (regression,compatibility,etc)test suit
e?
Is there any(i haven't found any neither in a downloaded stuff nor on website)
?
There is no current test suite for jakarta-oro and it's one of the things
In message [EMAIL PROTECTED], Surfbird Fang writes:
Yes,I want to process HTML with DOM, but the ordinary HTML isn't
well-formed.
That's something of a problem I guess. A thing to keep in mind is not
to count on a single regular expression to do all of the work for
you when processing HTML.
I'm starting to use the oro packages I'd like to use the following code:
The problem is that result is null. Can anyone explain why?
You are using the wrong pattern. The pattern you used will work with
Perl5Util. Remove the m## parts
daniel
--
To unsubscribe, e-mail: mailto:[EMAIL
which I would expect to replace all occurrences of \par with \r\n, but
it ends up replacing \par with \\r\n. Essentially it is not replacing
It is replacing par with \r\n. You need par instead of \\par
If you don't understand why, consider that \n is handled as a Java
string by the Java
In message [EMAIL PROTECTED]
rpa, Neil O'Toole writes:
I modified the regular expression to use SINGLINE_MASK as described, and
this obviously shortened the RE quite a bit, with the result that the
StackOverflowError no longer occurs. Thanks for the pointers to the bug
We still have to produce
In message 001201c1800a$3d106f00$57d042d9@behrangsa, Behrang Saeedzadeh wri
tes:
Does anybody know a useful tutorial on ORO?
The closest thing is the old OROMatcher 1.0 programmer's guide at
http://www.oroinc.com/developers/docs/OROMatcher/index.html
And updated guide is on the TODO list
It's all in the regular expression you pick. Perl5Compiler does not
currently support lookbehind assertions, which off the top of my head
is the only way I can see to do what you want. I'm assuming you want to
split on all occurrences of | not preceded by /. This requires context,
which is a
In message [EMAIL PROTECTED]
, [EMAIL PROTECTED] writes:
Doh! The problem was that either Win2K or java.exe (I'm not sure which)
was the globbing command line for me. I didn't expect that on a non-UNIX box
Probably java.exe assuming you were using a command prompt and not a
cygwin bash
/^ Message start.*Message ends /s
This somehow does not return me the correct number of multiple lines. Any
better suggestions please?
You probably need to use the /ms modifier or just ditch the ^. /s by
itself means that ^ will only match at the beginning of the input, not
right
In message [EMAIL PROTECTED], larry hamel
writes:
private String match( String toMatch ) {
if ( mMatcher.matches(toMatch, mIP_Pattern ) ) {
You mean to use contains().
daniel
--
To unsubscribe, e-mail: mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL
jakarta-oro 2.0.6 is ready for download from
http://jakarta.apache.org/builds/jakarta-oro/release/v2.0.6/
The following URL summarizes the changes made between 2.0.5 and 2.0.6
http://cvs.apache.org/viewcvs/~checkout~/jakarta-oro/CHANGES?content-type=text/
plain
This is a maintenance release,
In message [EMAIL PROTECTED], Robert Edgar w
rites:
which has got me up to about 450 line a second but that is still slow though
I am stil using the readline, but using readline and a string tokenizer I
can get 10x this speed which seem to me to indicate that the readline is not
really a
In message [EMAIL PROTECTED], Malcolm Davis
writes:
Is there any time to use Regular Expressions
when the format of the steam doesn't change?
Probably not if all you're doing is tokenizing based on a set of delimiters.
The equivalent of strtok() is O(n). A specific well-constructed regular
In message [EMAIL PROTECTED], St
eve Cohen writes:
apparently not. Granted, I am using a somewhat old version:
Patterns can be stored in Hashtables and HashMaps. This is done by the
PatternCache classes. Whatever you're running into is peculiar to HashSet.
HashSet.hashCode() is going into an
In message [EMAIL PROTECTED], St
eve Cohen writes:
Very interesting. I actually solved my problem by eliminating the
HashSet (which I didn't need for patterns anyway) and replacing it with
LinkedList.
I just compared the implementations of HashMap/HashSet JDK 1.3.1 vs. 1.4,
and there were a
In message [EMAIL PROTECTED], Adrian Boyko writes:
But the slightly simpler case, below, doesn't seem to match the 2nd subgroup
correctly:
Perl5 Expression: (.)(?=(.))
Search Input: XY
Match 1: X
Subgroups:
1: X
2:
Shouldn't the second subgroup in the second
The source for Substitution.java is:
public interface Substitution {
public void appendSubstitution(StringBuffer appendBuffer, MatchResult match,
int substitutionCount,
PatternMatcherInput originalInput,
In message [EMAIL PROTECTED], K
evin Stussman writes:
All the state affecting methods are synchronized to avoid the maintenance
of explicit locks in multithreaded programs. This philosophy differs from
the org.apache.oro.text.regex package, where you are expected to either
maintain explicit
In message [EMAIL PROTECTED], Phillip Rhod
es writes:
I am using a regex of v_.*?_ to get the matches, but my results are not
right. I get a hodpodge of results, some right, some wrong.
For example, I will get a _you are_ not going to find this match_
It seems to pass the first _ in the string
In message [EMAIL PROTECTED], Dmitry Berans
ky writes:
That's what I originally thought. But the output I'm getting doesn't
support this. Given the code:
...
as you can see the offsets are not consecutive. Am I doing something wrong?
It would appear you have found a bug. It was probably
In message [EMAIL PROTECTED], stephan schmidt write
s:
could you please list SnipSnap as a ORO regex user on the project page ?
Done.
daniel
--
To unsubscribe, e-mail: mailto:oro-user-unsubscribe;jakarta.apache.org
For additional commands, e-mail: mailto:oro-user-help;jakarta.apache.org
In message [EMAIL PROTECTED], Daniel Dekany writes:
According to ORO API docs, Perl5Substitution supports case
modification like \u or \l. I can't get it work:
Works for me:
java substituteExample '(.*)' '\u$1' 'foo'
substitute regex: (.*)
result: Foo
--
To unsubscribe, e-mail:
In message [EMAIL PROTECTED], Daniel Dekany writes:
Not for me... I tried that now. It prints this for me:
substitute regex: '(.*)'
result: '\ufoo'
I download 2.0.6 and will try with that...
You may have a version earlier than 2.0.3 lurking somewhere in your
runtime environment. This happens
In message [EMAIL PROTECTED], Daniel Dekany writes:
This is odd. I'm sure that I had only ORO 2.0.5 in the classpath. I
simply ran the example from the command line (no servlet enviroment or
something), and the classpath was simply path\to\the\oro.jar;. And now
that I have replaced that jar with
$1\\2 (instead of ${1}2}
Is this a normal behavior (as I stated I failed to make it work with perl,
but nobody's perfect) or a transient bug in the escaping process ? I tried
Yes, this is a normal/intentional behavior. It is a deliberate deviation
from Perl because the Perl behavior is
jakarta-oro 2.0.7 has been released. It is immediately available for
download from:
http://www.apache.org/dist/jakarta/oro/
and within the next few days will become available for download from
all of the mirrors listed at:
http://jakarta.apache.org/site/binindex.cgi
In message 3E47BFA4.16787.F0DE1CF@localhost, Martin Thomas writes:
I'm using ORO 2.0.7 and I get a stack overflow exception with the following:
...
String expression = (\\(|\\)|^| |,|\\.|;)Baseline(.)*(\\(|\\)| |,|\\.|;|$
...
Any help / advice would be appreciated.
Use character classes instead
In message 3E47D9AA.4731.F738D09@localhost, Martin Thomas writes:
Actually, exactly the same occurs using:
String expression = Baseline(.)*;
Er, how long is the string you're matching against? It really helps
if you include the exact input you're using so that others can
reproduce your
In message 3E47EA64.9585.FB4E4ED@localhost, Martin Thomas writes:
OK, I've attached a test case that demonstrates the problem.
Thanks. Your original expression works just fine if you change (.)* to
(.*), even with the alternations and saved groups. However, alternations
tend to be inefficient
In message [EMAIL PROTECTED], [EMAIL PROTECTED] writes:
Why is the 4th line: 4: null and not 4: C as I would have expected?
You have managed to identify a bug of the very worst kind that has been
hiding in the code for much too long. I just fixed it (deleted 3 characters
and added 2; it's
In message [EMAIL PROTECTED],
Steve Holt writes:
Sadly in both WebLogic and JRun these libraries appear take priority. In fact
I had no idea either had ORO installed and I dropped the ORO jar into the WEB-
INF/lib as usual as you say. In both cases they just used their own copies. In
Sad day
In message [EMAIL PROTECTED], Jero
me Jacobsen writes:
isn't is a surprise and a big disapointment to me too. Man they better make
this required in the next spec version. In the meantime I'll try and avoid
I just downloaded the Servlet 2.4 proposed final draft and the wording of
section 9.7.2
In message [EMAIL PROTECTED], Jero
me Jacobsen writes:
You're a big name in the Java community. If you emailed
We must live in two different universes.
[EMAIL PROTECTED] do you think there's any chance they'd consider
changing this? I doubt it, especially this late in the process. But maybe
I've got an idea. Why doesn't akapoor help Kwok and Kwok help akapoor
and that way I don't have to look at either issue? :) Seriously
though, I'm flat out of time, so if someone else on the list doesn't
help out, you won't get an answer out of me for a couple of weeks.
These problems don't
In message [EMAIL PROTECTED], Shar
pe, Cassandra writes:
Now the text that I need to split has pipe symbols embedded within it. For
example that text is ... Word A | Word B| Word| C. Is there a way to
escape the pipe symbol embedded within the word?
I don't understand the question. If
In message [EMAIL PROTECTED]
g, [EMAIL PROTECTED] writes:
Stats:
oro 2.0.7
Websphere in Win2k
JDK 1.3.1x
It is possible that Websphere ships with its own (earlier) version of
jakarta-oro and isn't allowing you to override it the way Tomcat does.
Poke around Websphere and see if you can find it
In message [EMAIL PROTECTED], writes:
Just that..
http://jregex.sourceforge.net/gstarted-advanced.html#imatching
Oh, yes, that's the attitude we want to promote. Go use something
else instead of participating and submitting a simple patch. :) :) :)
Sarcasm aside, there's been
In message [EMAIL PROTECTED], Hainer
, Neil writes:
This is my first attempt at using this package. I am getting the
following run time error:
Exception in thread main java.lang.NoClassDefFoundError: =
org/apache/oro/text/regex/MalformedPatternException
It compiles without error. Can anyone
In message [EMAIL PROTECTED], Jeroen Dijkmeijer wr
I think it's very difficult to extend oro for this functionality.
It's actually quite easy to expose the position of the last non-matching
character. This is similar to the partial matching question that came
up a while back, that I also
--- Forwarded Message
Subject: Found the Cause of Problem with ORO and Weblogic
Date: Thu, 02 Oct 2003 06:14:37 +
Mime-Version: 1.0
Content-Type: text/plain; format=flowed
Message-ID: [EMAIL PROTECTED]
X-OriginalArrivalTime: 02 Oct 2003 06:14:38.0648 (UTC) FILETIME=[75AC3B80:01C388AC]
We're currently using a licensed copy of PerlTools 1.1 and would like to =
download the copy of PerlTools 1.2 from your savarese.org web page. =
Unfortunately we get this error when we try:
Three comments:
1. PerlTools is unsupported. You should upgrade to jakarta-oro
In message [EMAIL PROTECTED], Timo Neumann writes:
What I have now is this:
^[\d\wäöüÄÖÜ\s]*$
Obviously that does not allow punctuation.
Any help?
From the perlre man page:
graph
Any alphanumeric or punctuation (special) character.
print
Any alphanumeric
In message [EMAIL PROTECTED], Jordi Salvat i Alabart writes:
First question (out of sheer curiosity): why is this later regexp faster
than the earlier one?
The expression is too long for me to analyze on a glance, but anything
you can do to rewrite a pattern that reduces backtracking will yield
In message [EMAIL PROTECTED], Chris Hyzer wri
tes:
I need to have an anchor in the regex to the current
position of the PatternMatcherInput. Does anything
exist? When I use ^, it doesnt work as the
PatternMatcherInput is iterated through. Would this
be useful to add in there?
Anchors key off
In message [EMAIL PROTECTED], Tarun Ramakrishna Elankath wri
tes:
My Perl5Pattern pattern (say ptn) is: \d{5}|\d{9}|\d{12}
...
I have input string, say zip5, zip9 and zip12 that are strings of digits
of length 5, 9 and 12 respectively.
When I use Perl5Matcher.matches(), zip5 passed, but zip9
In message [EMAIL PROTECTED], Marcin Augustyniak writes:
produces test\.test). When I changed the substitution pattern to =
$1 the result in the first case was \. (correct) but in the app =
$1 is the right pattern to use because \\$1 corresponds to \$1
in Perl and $1 corresponds to
In message [EMAIL PROTECTED], Robert Taylor
writes:
I need to parse the search string into tokens in the manner that search engine
s would.
Lexical analysis (i.e., tokenization) and parsing are two separate activities.
Sometimes you can get away with combining the two, but you'll find you can
In message [EMAIL PROTECTED]
, Thomas Mitchell, Jr. writes:
We are using 2.0.4 presently and the manifest is essentially empty. I
just got the 2.0.8 jar and the maifest does have the version listed,
thanks.
Ouch! I checked the log and it doesn't look like we started adding
version info to the
In message [EMAIL PROTECTED], Ga
ry Gregory writes:
Any ideas on timing for a 2.0.9?
I don't want to give the appearance of having the final word on this,
so this is just my speculation. I think the next release will be a 2.1
release. There's been some stuff on the TODO list for a while which
In message [EMAIL PROTECTED], jdijkme
[EMAIL PROTECTED] writes:
its not working because you r using a special character inside the set [],
so it is recognized as the set (\, s) i think. Use either [ \\n\\f\\r\\t]
or better, \\s (without the []).
All special backslashed characters except for
In message [EMAIL PROTECTED], [EMAIL PROTECTED] writes:
given the expression: [0x0]
I would simply like to remove it from the text string.
Substitute it with a zero-length string.
given the expression: [\r]
I would like to replace it with \n
At this time, I can locate the carriage return
In message [EMAIL PROTECTED], Thomas Zillinger writes:
I understand that by design it seems that GlobFilenameFilter only evaluates
the pathname.getName() part of a file (see RegexFilenameFilter). I was
It's required by the FileFilter and FilenameFilter interfaces. That's
how they're expected to
I am using the OROMatcher regular expression library, which is bundled with
WebLogic Server. I am using com.oroinc.text.regex.PatternCompiler to=
compile
a regular expression. I got the following exception :=0D
=0D
com.oroinc.text.regex.MalformedPatternException: Unreached characters at
end of
In message [EMAIL PROTECTED], Scott
Deboy writes:
I'd like to use GlobCompiler and groups, but parens are being prefixed by a sl
ash since they're in the __isPerl5MetaCharacter method.
...
I can work around it by using a Perl5Compiler but I was curious if this was by
design, and if so, is there
In message [EMAIL PROTECTED], Sann, St
ephan writes:
Been there - took a look at that. Let's say the code is a litle bit of
counterintuitive - especially when it comes to the nitty-gritty (where
...
Could you give a short clue where the backreferences are pasted back
in so I can apply the
In message [EMAIL PROTECTED], Kayiti Devan
andam writes:
Please do find attached the test case for it. (testRegEx.java which can be
compiled with putting jakarta-oro-2.0.8.jar in the classpath.)
With both the 2.0.5 and 2.0.8 versions I am finding the following results:
For the stringPattern --
I wrote:
The result will not be bar, it's bart. If you want foo|foot to match
foot you've either got to rewrite it as foot|foo or as ^(?:foo|foot)$.
For example,
echo foot | perl -pi -e 's/^(?:foo|foot)$/bar/g'
I should clarify that this is only if you want the entire input to
match. If you
In message [EMAIL PROTECTED],
Kataria, Satish writes:
Hi,
I am getting a weird error when I am putting large amount of data in a
string datatype.
Change
xsd:pattern value=(\S(\S|\s)*\S|\S)/
to
xsd:pattern value=(\S(?:.*\S)?)/
In message [EMAIL PROTECTED], Just
Lurker writes:
The expression in my previous post should read
^[[:ascii:][^[:cntrl:]\u0020[\].,;:@]]*$
On 10/3/05, Just Lurker [EMAIL PROTECTED] wrote:
Does ORO support the subtraction/intersection functionality?
No, that syntax is not supported.
I wrote:
You need to account for the newline because $ and ^ are zero-length
positional expressions (i.e., put \n between the $ and the ^). Keep
I forgot to add that the the $ and ^ are redundant once you put the \n in
between them, so you should use ^LOOK FOR ME\nAND ME$ as the expression.
In message [EMAIL PROTECTED], Duke
Tantiprasut writes:
Is there any plans to make getMatch() and group() threadsafe?
They are thread-safe. Concurrent calls to multiple methods will not leave
the object in an inconsistent state. What I think you're asking is for
the results to be
In message [EMAIL PROTECTED], Duke
Tantiprasut writes:
I'm curious why there is such a significant jump from the Perl5Matcher
compared to the java.util.regex?
A hefty chunk of that time comes from converting strings to char[] before
matching. I've tuned that benchmark before and trimmed 25% of
In message [EMAIL PROTECTED], CJ Jouhal
writes:
Pattern m_forbiddenTagsWithContentPattern =
s_perlCompiler.compile(
(script|object|applet|style|noscript)[^]*[\\s\\S]*?/\1[^]*,
Perl5Compiler.CASE_INSENSITIVE_MASK
73 matches
Mail list logo