I found the main issue. I was using ByteRef without the length. This fixed the problem.
String word = new String(ref.bytes,ref.offset,ref.length); Thank you. On Fri, Jun 22, 2012 at 6:26 PM, Mansour Al Akeel <mansour.alak...@gmail.com> wrote: > Hello all, > > I am tying to write a simple autosuggest functionality. I was looking > at some auto suggest code, and came over this post > http://stackoverflow.com/questions/120180/how-to-do-query-auto-completion-suggestions-in-lucene > I have been stuck with the some strange words, trying to see how they > are generated. Here's the Anayzer: > > public class AutoCompleteAnalyzer extends Analyzer { > public TokenStream tokenStream(String fieldName, Reader reader) { > TokenStream result = null; > result = new StandardTokenizer(Version.LUCENE_36, reader); > result = new EdgeNGramTokenFilter(result, > EdgeNGramTokenFilter.Side.FRONT, > 1, 20); > return result; > } > } > > And this is the relevant method that does the indexing. It's being > called with reindexOn("title"); > > private void reindexOn(String keyword) throws CorruptIndexException, > IOException { > log.info("indexing on " + keyword); > Analyzer analyzer = new AutoCompleteAnalyzer(); > IndexWriterConfig config = new > IndexWriterConfig(Version.LUCENE_36, > analyzer); > IndexWriter analyticalWriter = new > IndexWriter(suggestIndexDirectory, config); > analyticalWriter.commit(); // needed to create the initiale > index > IndexReader indexReader = > IndexReader.open(productsIndexDirectory); > Map<String, Integer> wordsMap = new HashMap<String, Integer>(); > LuceneDictionary dict = new LuceneDictionary(indexReader, > keyword); > BytesRefIterator iter = dict.getWordsIterator(); > BytesRef ref = null; > while ((ref = iter.next()) != null) { > String word = new String(ref.bytes); > int len = word.length(); > if (len < 3) { > continue; > } > if (wordsMap.containsKey(word)) { > String msg = "Word " + word + " Already > Exists"; > throw new IllegalStateException(msg); > } > wordsMap.put(word, indexReader.docFreq(new > Term(keyword, word))); > } > > for (String word : wordsMap.keySet()) { > Document doc = new Document(); > Field field = null; > field = new Field(SOURCE_WORD_FIELD, word, > Field.Store.YES, > Field.Index.NOT_ANALYZED); > doc.add(field); > field = new Field(GRAMMED_WORDS_FIELD, word, > Field.Store.YES, Field.Index.ANALYZED); > doc.add(field); > String count = Integer.toString(wordsMap.get(word)); > field = new Field(COUNT_FIELD, count, Field.Store.NO, > Field.Index.NOT_ANALYZED); // count > doc.add(field); > analyticalWriter.addDocument(doc); > } > analyticalWriter.commit(); > analyticalWriter.close(); > indexReader.close(); > } > > private static final String GRAMMED_WORDS_FIELD = "words"; > private static final String SOURCE_WORD_FIELD = "sourceWord"; > private static final String COUNT_FIELD = "count"; > > And now, my unit testing : > > @BeforeClass > public static void setUp() throws CorruptIndexException, IOException { > String idxFileName = "myIndexDirectory"; > Indexer indexer = new Indexer(idxFileName); > indexer.addDoc("Apache Lucene in Action"); > indexer.addDoc("Lord of the Rings"); > indexer.addDoc("Apache Solr in Action"); > indexer.addDoc("apples and Oranges"); > indexer.addDoc("apple iphone"); > indexer.reindexKeywords(); > search = new SearchEngine(idxFileName); > } > > The strange part, is looking under the index I found there are > sourceWords (lordne, applee, solres ). I understand that the ngram > will result in parts of each word. Ex: > > l > lo > lor > lord > > But of these go into one field, but what about "lorden" and "solres" > ?? I checked the docs for this, and looked into Jira, but didn't find > relevant info. > Is there something I am missing ?? > > I understand there could be easier ways to create this functionality > (http://wiki.apache.org/lucene-java/SpellChecker), but I like to > resolve this issue, and to > understand if I am doing something wrong. > > Thank you in advance. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org