StandardAnalyzer lowercases terms. The term you pass to the term enum
has an upper case character. You really need to be careful if you are
going to use an analyzer to store and then a term enum to cycle
through...the term you pass the term enum must match the output of the
analyzer.
- Mark
[EMAIL PROTECTED] wrote:
To begin with, thanks for Yours quick response.
I am still trying write this code and I've already written this one:
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.store.*;
import org.apache.lucene.index.*;
import org.apache.lucene.document.*;
import org.apache.lucene.search.*;
import org.apache.lucene.search.regex.*;
public class TmpMain {
public static void main(String[] args) throws Exception {
RAMDirectory directory = new RAMDirectory();
IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
true);
Document doc = new Document();
doc.add(new Field("field", "Cat Cot Cit Cet", Field.Store.YES,
Field.Index.TOKENIZED));
writer.addDocument(doc);
writer.optimize();
writer.close();
IndexReader reader=IndexReader.open(directory);
RegexTermEnum regexTermEnum=new RegexTermEnum(reader, new Term("field",
"C.t"), new JavaUtilRegexCapabilities());
/*
//Second Version:
WildcardTermEnum regexTermEnum=new WildcardTermEnum(reader, new
Term("field", "C?t"));
*/
while(regexTermEnum.next()) {
System.out.println("Next:");
System.out.println(regexTermEnum.term().text());
}
System.out.println("End.");
}
}
But the output of this code is:
End.
As you can see above I have tried change line:
RegexTermEnum regexTermEnum=new RegexTermEnum(reader, new Term("field",
"C.t"), new JavaUtilRegexCapabilities());
on
WildcardTermEnum regexTermEnum=new WildcardTermEnum(reader, new Term("field",
"C?t"));
but the result is the same :/
What am I doing wrong ?
Erick - you're not missing anything, except that the original poster
is after RegexQuery, not WildcardQuery. Both work basically the same
way, except in the pattern matching capabilities.
Erik
On Jul 20, 2007, at 5:45 PM, Erick Erickson wrote:
Erik:
Well, you wrote the book <G>. But I thought something like this
would work
TermDocs td = reader.termDocs();
WildcardTermEnum we = new WildcardTermEnum(reader, new term("field",
"c*t"));
while (we.next()) {
td.seek(we);
while (td.next()) {
report document contains term;
}
}
Although I admit I haven't tried it, so I could be totally off
base. What
am I missing?
Erick
On 7/20/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
Erick - I think you're mixing things up with WildcardQuery.
RegexQuery does support all regex capabilities (depending on the
underlying regex matcher used).
A couple of techniques you could use to achieve the goal:
* Use RegexTermEnum, though that'll give you the terms
across the
entire index, so maybe in your use case you could index a single
document into a RAMDirectory and RegexTermEnum on it.
* Try out SpanRegexQuery and use getSpans() to get the exact
matches.
Erik
On Jul 20, 2007, at 4:10 PM, Erick Erickson wrote:
First, the period (.) isn't part of the syntax, so make sure you
look
more carefully at the Lucene syntax...
Then, you might be able to use WildcardTermEnum to find
the terms that match and TermDocs to find the documents
that contain those terms.
There's nothing built into Lucene to do this out of the box, you
have to roll your own.
Best
Erick
On 20 Jul 2007 21:27:40 +0200, [EMAIL PROTECTED]
<[EMAIL PROTECTED]>
wrote:
Hello.
Let assume that I have this code in my application:
(...)
Query query = new RegexQuery(new Term("field", "C.T"));;
// searching...
(...)
And now, I would like to know if my application founded "cat" or
"cot" or
something else. How can I check what was founded by my
application ?
I would like to write application like this:
INPUT -> regular expression
OUTPUT -> file ---> word
example: INPUT = "C.T"
OUTPUT =
a.txt --> CAT
a.txt --> COT
b.txt --> CAT
b.txt --> CAT
b.txt --> COT
(...)
So, how to check what words were founded in particulary Documents
after
searching? I see that Hits class contains only founded
documents and
nothing more (I am new in this technology so I can be wrong...)
---------------------------------------------------------------------
-
Dowiedz sie, co naprawde podnieca kobiety. Wiecej wiesz,
latwiej je
oczarujesz
http://link.interia.pl/f1b17
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
----------------------------------------------------------------------
Najbogatsza Polka
Zobacz >>> http://link.interia.pl/f1ae2
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]