Hey Ian. Thank you so much for the quick reply. I'll definitely give Lucene a shot. I'll start off with it and get back to you in case of any problem.
Many thanks. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Mon, Feb 11, 2013 at 10:03 PM, Ian Lea <ian....@gmail.com> wrote: > You can certainly use lucene for this, and it will be blindingly fast > even if you use a disk based index. > > Just index documents as you've laid it out, with the field you want to > search on added as indexable and the others stored. > > I've never used Guava Table so can't comment on that, but with only a > few thousand words it would certainly be feasible to use something > like that. Better? I don't know. > > Personally I'd probably go with lucene as I'd be positive it would a) > work and b) be fast even if the thousands ending being tens of > thousands, or more. > > > > > -- > Ian. > > On Mon, Feb 11, 2013 at 3:14 PM, Mohammad Tariq <donta...@gmail.com> > wrote: > > Hello list, > > > > I have a scenario wherein I need an in-memory index as I need > > faster search. The problem goes like this : > > > > I have a list which contains a couple of thousands words. Each word has a > > corresponding ID and a list of synonyms. The actual word is a column in > my > > Hbase table. I get files which contain values for this column and I have > to > > extract values from these files and put them into the appropriate column. > > But sometimes files may contain the synonym instead of the actual word. > > Now, this is the place where index come into picture. I should have an > > index that contains all the words along with its ID and all the synonyms > > and it should be in-memory always so that inserts into Hbase are quick. > > Something like this : > > > > ID WORD SYNONYMS > > 13991 A a, A, Aa, aa, AA > > > > Then the index should be something like this : > > a A 13991 > > A A 13991 > > Aa A 13991 > > aa A 13991 > > AA A 13991 > > > > So that if I get 'a' in the file, I should be able to do a lookup and > index > > should give me 'A' along with '13991'. I need both the base name and the > > ID. The names could even be strings of 4 to 5 words. > > > > I have all this information stored in a Hbase table having two columns > > where the first column contains the actual word and the second column > > contains the entire list of synonyms. And the rowkey is the ID. > > > > Now. I am not getting whether it is feasible to use Lucene to get this or > > should I go with something like 'Guava Table' or something else. Need > some > > guidance as being new to Lucene I am not able to think in the right > > direction. If it is feasible to use Lucene to achieve this how to do it > > efficiently? > > > > I am using Hbase filters right now to do the fetch which is slowing down > > the process. > > > > I am sorry if my questions sound too childish or senseless as I am not > very > > good at Lucene. Thank you so much for your valuable time. > > > > Warm Regards, > > Tariq > > https://mtariq.jux.com/ > > cloudfront.blogspot.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >