Hello All, I am using 3.4. I need to find locations of query hits in a document. What I've implemented works fine for textual queries but does not work for phone numbers.
Here's how I index my docs: String oc = "Joe dialed 800-555-1212 but got a busy signal"; doc.add(new Field("contents", oc, Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS)); Now, here how I find locations. I search for a query. If I get a hit, I split my query (in case it's multi-word) into words and search for each of them using TermFreqVector like this: //String qstr = "my multiword query"; // for queries like this it works fine... String qstr = "800-555-1212"; // ...but not for ones like this Query query = parser.parse(qstr); TopDocs results = searcher.search(query, Integer.MAX_VALUE); ScoreDoc[] hits = results.scoreDocs; String[] subTerms = qstr.split("\\s+"); // phone string stays intact here for (int i = 0; i < hits.length; i++) { int docId = hits[i].doc; Document doc = searcher.doc(docId); TermFreqVector tfvector = reader.getTermFreqVector(docId, "contents"); TermPositionVector tpvector = (TermPositionVector)tfvector; for (String subTerm : subTerms) { String subq = subTerm.toLowerCase(); int termidx = tfvector.indexOf(subq); // get termidx = -1 here TermVectorOffsetInfo[] tvoffsetinfo = tpvector.getOffsets(termidx); for (int j=0;j<tvoffsetinfo.length;j++) { int offsetStart = tvoffsetinfo[j].getStartOffset(); int offsetEnd = tvoffsetinfo[j].getEndOffset(); // ... For a query like "800-555-1212", tfvector.indexOf returns -1. What am I doing wrong? Thanks, Ilya Zavorin --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org