Hi,
Merry Christmas!!
In case of Boolean query like 'sql AND server' .
I am using parser to get correct document containing both sql and server.
Inside for loop in below code I get correct documented and to get frequency I
need to sum frequency of 'sql' and 'server' individually with the help of
termDocs.read().
As I am searching through millions of document. So, to calculate frequency it
takes about 160 Second for 80k document.
Is there any way to get frequency of 'Boolean query' directly without
manipulation. As it takes lots of time. In case of single term and phrase
query, I got frequency for 80k document within 10 Seconds.
QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, field, analyzer);
parser.setDefaultOperator(QueryParser.AND_OPERATOR);
Query query = parser.parse("sql AND server");
TopDocs docs = searcher.search(query, null, n);
TermDocs termDocs = reader.termDocs();
termDocs.seek(new Term(field, query.toString(field).split("
")[0].toLowerCase()));
int[] ints = new int[docs.totalHits];
int[] ints1 = new int[docs.totalHits];
termDocs.read(ints, ints1);
List lsts = Arrays.asList(ArrayUtils.toObject(ints));
List lsts1 = Arrays.asList(ArrayUtils.toObject(ints1));
termDocs.seek(new Term(field, query.toString(field).split("
")[1].toLowerCase()));
int[] inta = new int[docs.totalHits];
int[] inta1 = new int[docs.totalHits];
termDocs.read(inta, inta1);
List lsta = Arrays.asList(ArrayUtils.toObject(inta));
List lsta1 = Arrays.asList(ArrayUtils.toObject(inta1));
int totalFreq = 0;
int docId = -1;
String a = null;
int contId = 0;
String path=null;
for (int i = 0; i < docs2.scoreDocs.length; i++) {
docId = docs2.scoreDocs[i].doc;
path=reader.document(docId).get("path");
a = path.substring(path.lastIndexOf("\\") + 1,
path.lastIndexOf("."));
try {
if(a.indexOf("e")>-1){
contId = Integer.parseInt(a.substring(0,
a.length()-1));
}else{
contId = Integer.parseInt(a);
}
} catch (Exception e) {
// e.printStackTrace();
}
if ((lsts.indexOf(docId) > -1 && lsta.indexOf(docId) >
-1) || (lsta.indexOf(docId) > -1) && lsts.indexOf(docId) > -1) {
totalFreq = (Integer)
lsts1.get(lsts.indexOf(docId)) + (Integer) lsta1.get(lsta.indexOf(docId));
} else if (lsts.indexOf(docId) > -1) {
totalFreq = (Integer)
lsts1.get(lsts.indexOf(docId));
} else if (lsta.indexOf(docId) > -1) {
totalFreq = (Integer)
lsta1.get(lsta.indexOf(docId));
}
w.write(contId+"\t"+ID+"\t"+totalFreq+"\t"+reader.document(docId).get("path")+"\n");
}
Thanks & Regards,
Ranjit Kumar
Associate Software Engineer
[cid:[email protected]]
US: +1 408.540.0001
UK: +44 208.099.1660
India: +91 124.474.8100 | +91 124.410.1350
FAX: +1 408.516.9050
http://www.otssolutions.com
===================================================================================================
Private, Confidential and Privileged. This e-mail and any files and
attachments transmitted with it are confidential and/or privileged. They are
intended solely for the use of the intended recipient. The content of this
e-mail and any file or attachment transmitted with it may have been changed or
altered without the consent of the author. If you are not the intended
recipient, please note that any review, dissemination, disclosure, alteration,
printing, circulation or Transmission of this e-mail and/or any file or
attachment transmitted with it, is prohibited and may be unlawful. If you have
received this e-mail or any file or attachment transmitted with it in error
please notify OTS Solutions at [email protected]
===================================================================================================