Hello All,
I am using 3.4. I need to find locations of query hits in a document. What I've
implemented works fine for textual queries but does not work for phone numbers.
Here's how I index my docs:
String oc = "Joe dialed 800-555-1212 but got a busy signal";
doc.add(new Field("contents",
oc,
Field.Store.NO,
Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
Now, here how I find locations. I search for a query. If I get a hit, I split
my query (in case it's multi-word) into words and search for each of them using
TermFreqVector like this:
//String qstr = "my multiword query"; // for queries like this it works
fine...
String qstr = "800-555-1212"; // ...but not for ones like this
Query query = parser.parse(qstr);
TopDocs results = searcher.search(query, Integer.MAX_VALUE);
ScoreDoc[] hits = results.scoreDocs;
String[] subTerms = qstr.split("\\s+"); // phone string stays intact here
for (int i = 0; i < hits.length; i++) {
int docId = hits[i].doc;
Document doc = searcher.doc(docId);
TermFreqVector tfvector = reader.getTermFreqVector(docId, "contents");
TermPositionVector tpvector = (TermPositionVector)tfvector;
for (String subTerm : subTerms)
{
String subq = subTerm.toLowerCase();
int termidx = tfvector.indexOf(subq); // get termidx = -1 here
TermVectorOffsetInfo[] tvoffsetinfo =
tpvector.getOffsets(termidx);
for (int j=0;j<tvoffsetinfo.length;j++) {
int offsetStart = tvoffsetinfo[j].getStartOffset();
int offsetEnd = tvoffsetinfo[j].getEndOffset();
// ...
For a query like "800-555-1212", tfvector.indexOf returns -1. What am I doing
wrong?
Thanks,
Ilya Zavorin
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]