Hi, This is my first version, it isn't fast, because I want to get this information without modifying index. Now I'm working to improve it (including freeling).
public String docsTerme(IndexReader reader, String terme) { String resultat = ""; TermPositions tP; ArrayList alDocs = new ArrayList(); long start = new Date().getTime(); int veinsTrobats = 0; //neightbours find it //Where is the term try { tP = reader.termPositions(new Term("contingut", terme)); //Documents where the term is found. while (tP.next()) { infoTerme it = new infoTerme(terme, tP.doc(), tP.freq()); resultat += it.toString(); for (int i = 0; i < it.getFrequencia(); i++) { it.add(tP.nextPosition()); } alDocs.add(it); //we store: term, document id, positions resultat += "(" + it.toStringPosicions() + ")<br/>"; } } catch (IOException e) { System.out.println("Error trobant documents termes: " + e); return null; } //Terms in a document for (int i = 0; i < alDocs.size(); i++) { infoTerme iT = (infoTerme) alDocs.get(i); //We need term,id document and positions resultat += "<br/>" + iT.getId_document() + ":<br/>"; //Id document try { TermFreqVector[] tfv = reader.getTermFreqVectors(iT.getId_document()); //All the terms found in a document int j = 0; String[] llistatTermes = tfv[j].getTerms(); int paraulesAnalitzades = 0; veinsTrobats = 0; while (veinsTrobats < iT.getFrequencia() && paraulesAnalitzades < llistatTermes.length) { resultat += "," + llistatTermes[paraulesAnalitzades]; TermPositions termP = reader.termPositions(new Term("contingut", llistatTermes[paraulesAnalitzades]));//Documents on apareix el terme while (termP.next()) { if (termP.doc() == iT.getId_document()) { //The word it's found in the same id document, maybe neightbours boolean veins = false; int ind = 0; while (!veins && ind < termP.freq()) { int posicio = termP.nextPosition(); if (iT.sonVeins(posicio)) { veins = true; resultat += "<br/>" + veinsTrobats + "/" + iT.getFrequencia() + " They are neightbours (proximity 1):" + iT.getTerme() + " i " + llistatTermes[paraulesAnalitzades] + "(" + posicio + ")<br/>"; veinsTrobats++; } else { ind++; } } } } paraulesAnalitzades++; } } catch (IOException e) { System.out.println("Error I cant find terms: " + e); return null; } } long end = new Date().getTime(); resultat += "<br/>Time elapsed: " + (end - start) + "ms"; return resultat; } http://www.nabble.com/file/p20265608/infoTerme.java infoTerme.java thank you, Albert Aleksander M. Stensby wrote: > > From what I can understand, you want to insert the word "history" and > then > get proposed "related" terms in combination with your input query. > In essense this would be to do a "look-up" on top-terms in the subset of > documents matching the initial query "history". Exactly how you could do > this is a bit uncertain from my knowledge, but I suggest you read up on > term-frequency and the tf-idf scheme. > > Also: take a look at the org.apache.lucene.search.similar package: > http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/similar/package-summary.html > and read the motivation email listed in the first segment of > http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/similar/MoreLikeThis.html > > I couldn't really see how you would autocomplete after the word history > without listing a bunch of un-interesting terms as suggestions... But i > might be wrong... Of course, if it was autocompletion you were looking > for¸ Asbjørn answered that one just fine:) > > Best regards, > Aleksander M. Stensby > > > On Thu, 09 Oct 2008 18:49:26 +0200, Asbjørn A. Fellinghaug > <[EMAIL PROTECTED]> wrote: > >> Albert Juhe: >>> >>> Hi, >>> >>> I want to make a wizard that can help to find n-grams terms. >>> For example: >>> If i want to search History, after write it the system propose you the >>> following searches: >>> history europe >>> history spain >>> history ..... >>> Consulting the terms indexed. >>> >>> Does it exits in Lucene? >> >> Hi. >> >> I interpret your question in such a way that you want autocompletion in >> your search system? In that case, I believe there are some Analyzer's >> which does this in the 'contrib' package. Also, I've created an Analyzer >> which creates "bigrams" (n-gram of size 2) in my master thesis. >> Feel free to download it from this page: >> http://asbjorn.fellinghaug.com/blog/2008/08/the-code-for-my-master-thesis/ >> >> Also, have a look at the package org.apache.lucene.analysis.ngram: >> http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/analysis/ngram/package-summary.html >> > > > > -- > Aleksander M. Stensby > Senior Software Developer > Integrasco A/S > +47 41 22 82 72 > [EMAIL PROTECTED] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/wizard-for-search-in-Lucene-tp19900220p20265608.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]