This is the code for searching:
String index = "index";
String field = "contents";
IndexReader reader = IndexReader.open(index);
Searcher searcher = new IndexSearcher(reader);
System.out.println("Enter query: ");
String line = ".IN.";//in jakarta regexp this is like * IN *
RegexQuery rxquery = new RegexQuery(new Term(field,line));
Hits hits = searcher.search(rxquery);
if(hits!=null){
for(int k = 0; k<100 && k<hits.length(); k++){
if(hits.doc(k)!=null)
System.out.println(hits.doc(k).getField("contents").stringValue());
}
}
And this is the part of creating the index:
File directory = new File("index");
IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
true,
IndexWriter.MaxFieldLength.LIMITED);
List<String> records = getRecords();//returns a list of record values from
database, all of them are phrases
Iterator<String> i = records.iterator();
while(i.hasNext()){
Document doc = new Document();
doc.add(new Field(field, i.next(), Field.Store.YES,
Field.Index.NOT_ANALYZED));
writer.addDocument(doc);
}
writer.optimize();
writer.close();
This code works as I want but just matching with the first word of the
phrase. I think the problem is the index building, but I don't know how to
fix it...
Any ideas?
Thank you so much!!
Steven A Rowe wrote:
>
> On 5/8/2009 at 9:13 AM, Ian Lee wrote:
>> I'm surprised that it matches either - don't you need ".*in" where .*
>> means match any character zero or more times? See the javadoc for
>> java.util.regex.Pattern, or for Jakarta Regexp if you are using that
>> package.
>>
>> Unless you're an expert in regexps it is probably worth playing with
>> them outside your lucene code to start with e.g. with simple
>> String.matches(regexp) calls. They can take some getting used to.
>> And try to avoid anything with backslashes if you can!
>
> The java.util.regex.Pattern implementation (the default RegexQuery
> implementation) actually uses Matcher.lookingAt(), which is equivalent to
> prepending a "^" anchor to the beginning of the pattern, so if Huntsman84
> is using the default implementation, then I agree with Ian: I'm surprised
> it matches either.
>
> However, the Jakarta Regexp implementation uses RE.match(), which does
> *not* require a beginning-of-string match.
>
> Hunstman84, are you using the Jakarta Regexp implementation? If so, then
> like you, I'm surprised it's not matching both :).
>
> It would be useful to see some real code, including how you index your
> records.
>
> Steve
>
>> On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <[email protected]>
>> wrote:
>> >
>> > Hi,
>> >
>> > I am using RegexQuery for searching in a set of records wich are
>> > phrases of several words each. My aim is to find any phrase that
>> > contains the given group of letters (e.g. "in"). For that case,
>> > I am building the query with the regular expression ".in.", so it
>> > should return all phrases with contain "in", but the search only
>> > matches with the first word of the phrase.
>> >
>> > For example, if my records are "Knowing yourself" and "Old
>> > clinic", the correct search would return 2 matches, but it only
>> > matches with "Knowing yourself".
>> >
>> > How could I fix this?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
>
--
View this message in context:
http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]