RE class is not optimized for multithreaded applications
use static instance of REProgram instead...

try this...

public class Myclass {

 private static REProgram my_program = null;

static{
RECompiler compiler = new RECompiler();
my_program = compiler.compile("^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?");
}


 public void myMethod() {
     ...
     RE re = new RE(my_program);//thread-safe
     ...
 }
}

have a good time
Celestino :)

Rob Beckett wrote:

Hi!

I've struggled over this one for quite a while, and I'm hitting rock bottom, so I've joined the mailing list to appeal to you for help, and perhaps in future offer some help of my own. :)

The code I'm working with involves the indexing spiders for blogshares.com, a virtual stock market utilizing weblogs as companies. Jakarta Regexp has been in use for some time to verify the URLs of blogs added to the system, and perform additional tasks on those blogs as needed. Recently I added Regexp functions to the main link parser class itself, in this case taking a Vector of links from a parsed HTML page and running each one through a RE match for validity/checking before performing database comparisons. However, in doing that, the spiders' log (powered by Log4J!) began including some rather perplexing errors, two examples are included here. It happens on average every five minutes, and I'm rather stumped. :-/

The expression used in each case is initialized by:
---snip---
private static RE urlcheck = new RE("^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?");
---snip---


1. Attempting the main RE match against the 'href' string, which does not occur if the string is empty or does not meet other criteria in advance.
---snip---
java.lang.StringIndexOutOfBoundsException: String index out of range: 58
at java.lang.String.charAt(String.java:444)
at org.apache.regexp.StringCharacterIterator.charAt(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchAt(Unknown Source)
at org.apache.regexp.RE.match(Unknown Source)
at org.apache.regexp.RE.match(Unknown Source)
at org.apache.regexp.RE.match(Unknown Source)
at com.blogshares.bot.LinkParser.parse(LinkParser.java:667)
at com.blogshares.bot.LinkParser.workerThread(LinkParser.java:1007)
at com.blogshares.bot.LinkParser.run(LinkParser.java:1193)
at java.lang.Thread.run(Thread.java:534)
---snip---


The input string:
---snip---
http://doorknot.blogspot.com/2005/01/shady-in-de-alpen.html#comments
---snip---

LinkParser.java:667:
---snip---
boolean validURL = urlcheck.match(href);
---snip---

2. Attempting to retrieve the fifth parenthesized group from the matched URL.

Error Log:
---snip---
java.lang.StringIndexOutOfBoundsException: String index out of range: -14
at java.lang.String.substring(String.java:1444)
at org.apache.regexp.StringCharacterIterator.substring(Unknown Source)
at org.apache.regexp.RE.getParen(Unknown Source)
at com.blogshares.bot.LinkParser.parse(LinkParser.java:679)
at com.blogshares.bot.LinkParser.workerThread(LinkParser.java:1007)
at com.blogshares.bot.LinkParser.run(LinkParser.java:1193)
at java.lang.Thread.run(Thread.java:534)
---snip---


The input string:
---snip---
http://roseqoloredlenses.blogspot.com/atom.xml
---snip---

LinkParser.java:679:
---snip---
String hrefPath  = urlcheck.getParen(5);
---snip---

This is a multi-threaded application, running on the J2SDK V1.4.2 & Jakarta Regexp V1.3, tested on more than one box with the same results. Any ideas?

Thanks & your time is appreciated,

Rob Beckett
http://www.blogshares.com/



Reply via email to