Hello All,

I am using 3.4. I need to find locations of query hits in a document. What I've 
implemented works fine for textual queries but does not work for phone numbers. 

Here's how I index my docs:

String oc = "Joe dialed 800-555-1212 but got a busy signal";
doc.add(new Field("contents", 
                oc, 
                Field.Store.NO,
                Field.Index.ANALYZED, 
                Field.TermVector.WITH_POSITIONS_OFFSETS));


Now, here how I find locations. I search for a query. If I get a hit, I split 
my query (in case it's multi-word) into words and search for each of them using 
TermFreqVector like this:


//String qstr = "my multiword query";   // for queries like this it works 
fine...
String qstr = "800-555-1212";   // ...but not for ones like this
Query query = parser.parse(qstr);
TopDocs results = searcher.search(query, Integer.MAX_VALUE);
ScoreDoc[] hits = results.scoreDocs;

String[] subTerms = qstr.split("\\s+"); // phone string stays intact here

for (int i = 0; i < hits.length; i++) {
        int docId = hits[i].doc;
        Document doc = searcher.doc(docId);
        
        TermFreqVector tfvector = reader.getTermFreqVector(docId, "contents");  
        TermPositionVector tpvector = (TermPositionVector)tfvector;   
        
        for (String subTerm : subTerms)
        {
                String subq = subTerm.toLowerCase();
                int termidx = tfvector.indexOf(subq);  // get termidx = -1 here
                
                TermVectorOffsetInfo[] tvoffsetinfo = 
tpvector.getOffsets(termidx);  
            for (int j=0;j<tvoffsetinfo.length;j++) {  
                int offsetStart = tvoffsetinfo[j].getStartOffset();  
                int offsetEnd = tvoffsetinfo[j].getEndOffset(); 
                // ...

For a query like "800-555-1212", tfvector.indexOf returns -1. What am I doing 
wrong? 

Thanks,

Ilya Zavorin


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to