Hi,
This is my first version, it isn't fast, because I want to get this
information without modifying index.
Now I'm working to improve it (including freeling).
public String docsTerme(IndexReader reader, String terme) {
String resultat = "";
TermPositions tP;
ArrayList alDocs = new ArrayList();
long start = new Date().getTime();
int veinsTrobats = 0; //neightbours find it
//Where is the term
try {
tP = reader.termPositions(new Term("contingut", terme));
//Documents where the term is found.
while (tP.next()) {
infoTerme it = new infoTerme(terme, tP.doc(), tP.freq());
resultat += it.toString();
for (int i = 0; i < it.getFrequencia(); i++) {
it.add(tP.nextPosition());
}
alDocs.add(it); //we store: term, document id, positions
resultat += "(" + it.toStringPosicions() + ")<br/>";
}
} catch (IOException e) {
System.out.println("Error trobant documents termes: " + e);
return null;
}
//Terms in a document
for (int i = 0; i < alDocs.size(); i++) {
infoTerme iT = (infoTerme) alDocs.get(i); //We need term,id
document and positions
resultat += "<br/>" + iT.getId_document() + ":<br/>"; //Id
document
try {
TermFreqVector[] tfv =
reader.getTermFreqVectors(iT.getId_document()); //All the terms found in a
document
int j = 0;
String[] llistatTermes = tfv[j].getTerms();
int paraulesAnalitzades = 0;
veinsTrobats = 0;
while (veinsTrobats < iT.getFrequencia() &&
paraulesAnalitzades < llistatTermes.length) {
resultat += "," + llistatTermes[paraulesAnalitzades];
TermPositions termP = reader.termPositions(new
Term("contingut", llistatTermes[paraulesAnalitzades]));//Documents on
apareix el terme
while (termP.next()) {
if (termP.doc() == iT.getId_document()) { //The word
it's found in the same id document, maybe neightbours
boolean veins = false;
int ind = 0;
while (!veins && ind < termP.freq()) {
int posicio = termP.nextPosition();
if (iT.sonVeins(posicio)) {
veins = true;
resultat += "<br/>" + veinsTrobats + "/"
+ iT.getFrequencia() + " They are neightbours (proximity 1):" +
iT.getTerme() + " i " + llistatTermes[paraulesAnalitzades] + "(" + posicio +
")<br/>";
veinsTrobats++;
} else {
ind++;
}
}
}
}
paraulesAnalitzades++;
}
} catch (IOException e) {
System.out.println("Error I cant find terms: " + e);
return null;
}
}
long end = new Date().getTime();
resultat += "<br/>Time elapsed: " + (end - start) + "ms";
return resultat;
}
http://www.nabble.com/file/p20265608/infoTerme.java infoTerme.java
thank you,
Albert
Aleksander M. Stensby wrote:
>
> From what I can understand, you want to insert the word "history" and
> then
> get proposed "related" terms in combination with your input query.
> In essense this would be to do a "look-up" on top-terms in the subset of
> documents matching the initial query "history". Exactly how you could do
> this is a bit uncertain from my knowledge, but I suggest you read up on
> term-frequency and the tf-idf scheme.
>
> Also: take a look at the org.apache.lucene.search.similar package:
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/similar/package-summary.html
> and read the motivation email listed in the first segment of
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/similar/MoreLikeThis.html
>
> I couldn't really see how you would autocomplete after the word history
> without listing a bunch of un-interesting terms as suggestions... But i
> might be wrong... Of course, if it was autocompletion you were looking
> for¸ Asbjørn answered that one just fine:)
>
> Best regards,
> Aleksander M. Stensby
>
>
> On Thu, 09 Oct 2008 18:49:26 +0200, Asbjørn A. Fellinghaug
> <[EMAIL PROTECTED]> wrote:
>
>> Albert Juhe:
>>>
>>> Hi,
>>>
>>> I want to make a wizard that can help to find n-grams terms.
>>> For example:
>>> If i want to search History, after write it the system propose you the
>>> following searches:
>>> history europe
>>> history spain
>>> history .....
>>> Consulting the terms indexed.
>>>
>>> Does it exits in Lucene?
>>
>> Hi.
>>
>> I interpret your question in such a way that you want autocompletion in
>> your search system? In that case, I believe there are some Analyzer's
>> which does this in the 'contrib' package. Also, I've created an Analyzer
>> which creates "bigrams" (n-gram of size 2) in my master thesis.
>> Feel free to download it from this page:
>> http://asbjorn.fellinghaug.com/blog/2008/08/the-code-for-my-master-thesis/
>>
>> Also, have a look at the package org.apache.lucene.analysis.ngram:
>> http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/analysis/ngram/package-summary.html
>>
>
>
>
> --
> Aleksander M. Stensby
> Senior Software Developer
> Integrasco A/S
> +47 41 22 82 72
> [EMAIL PROTECTED]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
--
View this message in context:
http://www.nabble.com/wizard-for-search-in-Lucene-tp19900220p20265608.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]