Hi Sachin i want to look into ur indexing code. please share it
- Kumaran R On Tue, Aug 19, 2014 at 7:18 PM, Sachin Kulkarni <kulk...@hawk.iit.edu> wrote: > Hi, > > Sorry for all the code, It got sent out accidentally. > > The following code is part of the Benchmark utility in Lucene, specifically > SubmissionReport.java > > > // Here reader is the IndexReader. > > > Iterator itr = docMap.entrySet().iterator(); > int totalNumDocuments = reader.numDocs(); > ScoreDoc sd[] = td.scoreDocs; > String sep = " \t "; > DocNameExtractor docext = new DocNameExtractor(docNameField); > for (int i=0; i<sd.length; i++) > { > String docName = docext.docName(searcher,sd[i].doc); > // ***** The Map of documents will help us get the docid > int indexedDocID = docMap.get(docName); > Fields fields = reader.getTermVectors(indexedDocID); > Iterator<String> strItr=fields.iterator(); > > /// ********** The following while is printing the fieldNames which only > show 2 fields out of the 5 that I am looking for. > while(strItr.hasNext()) > { > String fieldName = strItr.next(); > System.out.println("next field " + fieldName); > } > Document DocList= reader.document(indexedDocID); > List<IndexableField> field_list = DocList.getFields(); > > /// ****** The following for loop prints the five fields and it's > related information. > for(int j=0; j < field_list.size(); j++) > { > System.out.println ( "list field is : " + field_list.get(j).name() ); > IndexableFieldType IFT = field_list.get(j).fieldType(); > System.out.println(" Field storeTermVectorOffsets : " + > IFT.storeTermVectorOffsets()); > System.out.println(" Field stored :" + IFT.stored()); > } > // ***************************** // > } > > > /**** THE OUTPUT for this section of code is > fields size : 2 > next field body > next field docname > > list field is : docid > Field storeTermVectorOffsets : false > list field is : docname > Field storeTermVectorOffsets : false > list field is : docdate > Field storeTermVectorOffsets : false > list field is : doctitle > Field storeTermVectorOffsets : false > list field is : body > Field storeTermVectorOffsets : false > > *******/ > > Hope this code comes out legible in the email. > > Thank you. > > Regards, > Sachin Kulkarni > > > On Tue, Aug 19, 2014 at 8:39 AM, Sachin Kulkarni <kulk...@hawk.iit.edu> > wrote: > > > Hi Kumaran, > > > > > > > > The following code is part of the Benchmark utility in Lucene, > > specifically SubmissionReport.java > > > > > > Iterator itr = docMap.entrySet().iterator(); > > int totalNumDocuments = reader.numDocs(); > > ScoreDoc sd[] = td.scoreDocs; > > String sep = " \t "; > > DocNameExtractor docext = new DocNameExtractor(docNameField); > > for (int i=0; i<sd.length; i++) > > { > > System.out.println("i = " + i); > > String docName = docext.docName(searcher,sd[i].doc); > > System.out.println("docName : " + docName + "\t map size " + > > docMap.size()); > > // ***** The Map will help us get the docid and > > int indexedDocID = docMap.get(docName); > > System.out.println("indexed doc id : " + indexedDocID + "\t docname : " > > + docName); > > // ******** GET THE tf-idf data now ************ // > > Fields fields = reader.getTermVectors(indexedDocID); > > System.out.println("fields size : " + fields.size()); > > // **** Print log output for testing **** // > > Iterator<String> strItr=fields.iterator(); > > while(strItr.hasNext()) > > { > > String fieldName = strItr.next(); > > System.out.println("next field " + fieldName); > > } > > Document DocList= reader.document(indexedDocID); > > List<IndexableField> field_list = DocList.getFields(); > > for(int j=0; j < field_list.size(); j++) > > { > > System.out.println ( "list field is : " + field_list.get(j).name() ); > > IndexableFieldType IFT = field_list.get(j).fieldType(); > > System.out.println(" Field storeTermVectorOffsets : " + > > IFT.storeTermVectorOffsets()); > > //System.out.println(" Field stored :" + IFT.stored()); > > //for (FieldInfo.IndexOptions c : IFT.indexOptions().values()) > > // System.out.println(c); > > } > > // *****************************88 // > > > > > > On Tue, Aug 19, 2014 at 2:04 AM, Kumaran Ramasubramanian < > > kums....@gmail.com> wrote: > > > >> Hi Sachin Kulkarni, > >> > >> If possible, Please share your code. > >> > >> > >> - > >> Kumaran R > >> > >> > >> > >> > >> > >> On Tue, Aug 19, 2014 at 9:07 AM, Sachin Kulkarni <kulk...@hawk.iit.edu> > >> wrote: > >> > >> > Hi, > >> > > >> > I am using Lucene 4.6.0. > >> > > >> > I have been storing 5 fields for my documents in the index, namely > body, > >> > title, docname, docdate and docid. > >> > > >> > But when I get the fields using > >> IndexReader.getTermVectors(indexedDocID) I > >> > only get > >> > the docname and body fields and can retrieve the term vectors for > those > >> > fields, but not others. > >> > > >> > I check to see if all the five fields are stored using > >> > IndexedFieldType.stored() > >> > and all return true. I also check to see that all the fields are > indexed > >> > and they are, but > >> > still when I try to getTermVectors I only receive two fields back. > >> > > >> > Is there any other config setting that I am missing while indexing > that > >> is > >> > causing this behavior? > >> > > >> > Thanks to Kumaran and Ian for their answers to my previous questions > >> but I > >> > have not been able to figure out the above one yet. > >> > > >> > Thank you very much. > >> > > >> > Regards, > >> > Sachin > >> > > >> > > > > >