Paul: I have recently been made aware of your significant contributions to the Lucene project.
I own a highly focused and radically specialized talent agency, Core Search Group. Our sole mission is to represent the top software development talent in the US to the companies who are integrating open-wource technology into their strategy to develop the best systems in their industry. We have a client who is very interested in using Lucene or something similar as their springboard into a new industry. And we work with many other companies who would be interested in your background. I'm sure you have had the experience of trying to deal with subpar recruiters and want to assure you this is different. With 5 years of experience working only for people like you, the "rock stars" of software development, we know how you want to be treated. My only objective at this point is to talk with you and learn about your career goals and aspirations. Then my goal is to help you reach your goals. When can we start with a phone call? Thank you, Dave Freireich President Core Search Group "Where software development talent is number one" www.coresearchinc.com Original Message: ----------------- From: Paul Elschot [EMAIL PROTECTED] Date: Thu, 26 Aug 2004 01:11:48 +0200 To: [EMAIL PROTECTED] Subject: Re: Regexp Query On Wednesday 25 August 2004 22:06, Naseeruddin Mohammad wrote: > I want to develop regular expression based search for lucene. Figured > that I need to write a Query which processes it for regular > expression. I am planning to use apache regexp api. > I need suggestions and tips in doing so. > > I think it is analogs as WildCharQuery. Or PrefixQuery. One possibility is to split the regex into a constant prefix and the remainder. Then walk the terms (much as in PrefixQuery) that have the prefix and match their remainders to the remaining regex. You then have all the terms to pass to a BooleanQuery. There is a default maximum of 1024 clauses there, so it might make sense to have a minimum length for the constant prefix. (The maximum nr. of clauses exists because each search Term requires a buffer.) Regards, Paul Elschot P.S. This might be useful, just ignore the defined class name and its superclass. (It's a part of earlier posted code.) /* Licensed under the Apache License, Version 2.0 */ import org.apache.lucene.index.Term; import org.apache.lucene.index.TermEnum; import org.apache.lucene.index.IndexReader; import java.io.IOException; import java.util.regex.Pattern; import java.util.regex.Matcher; public class SrndTruncQuery extends SimpleTerm { public SrndTruncQuery(String truncated, char unlimited, char mask) { super(false); /* not quoted */ this.truncated = truncated; this.unlimited = unlimited; this.mask = mask; truncatedToPrefixAndPattern(); } private final String truncated; private final char unlimited; private final char mask; private String prefix; private Pattern pattern; public String getTruncated() {return truncated;} public String toStringUnquoted() {return getTruncated();} protected boolean matchingChar(char c) { return (c != unlimited) && (c != mask); } protected void appendRegExpForChar(char c, StringBuffer re) { if (c == unlimited) re.append(".*"); else if (c == mask) re.append("."); else re.append(c); } protected void truncatedToPrefixAndPattern() { int i = 0; while ((i < truncated.length()) && matchingChar(truncated.charAt(i))) { i++; } prefix = truncated.substring(0, i); StringBuffer re = new StringBuffer(); while (i < truncated.length()) { appendRegExpForChar(truncated.charAt(i), re); i++; } pattern = Pattern.compile(re.toString()); } public void visitMatchingTerms( IndexReader reader, String fieldName, MatchingTermVisitor mtv) throws IOException { boolean expanded = false; int prefixLength = prefix.length(); TermEnum enumerator = reader.terms(new Term(fieldName, prefix)); Matcher matcher = pattern.matcher(""); try { do { Term term = enumerator.term(); if (term != null) { String text = term.text(); if ((! text.startsWith(prefix)) || (! term.field().equals(fieldName))) { break; } else { matcher.reset( text.substring(prefixLength)); if (matcher.matches()) { mtv.visitMatchingTerm(term); expanded = true; } } } } while (enumerator.next()); } finally { enumerator.close(); matcher.reset(); } if (! expanded) { System.out.println("No terms in " + fieldName + " field for: " + toString()); } } } --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -------------------------------------------------------------------- mail2web - Check your email from the web at http://mail2web.com/ . --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]