It actually did. I'll cross check once more and make sure I was doing it correctly.
Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Wed, Feb 13, 2013 at 1:44 AM, Ian Lea <ian....@gmail.com> wrote: > "AA-" indexed as a StringField was matched by a TermQuery for "AA"? > Sounds surprising. > > > -- > Ian. > > > On Tue, Feb 12, 2013 at 6:32 PM, Mohammad Tariq <donta...@gmail.com> > wrote: > > Thanks again Ian. I'll make the changes suggested by you. And I am using > > dots because if I search for 'AA' it was giving me 'AA-' as well. > > > > Warm Regards, > > Tariq > > https://mtariq.jux.com/ > > cloudfront.blogspot.com > > > > > > On Tue, Feb 12, 2013 at 9:50 PM, Ian Lea <ian....@gmail.com> wrote: > > > >> From a glance it looks fine. I don't see what you gain by adding dots > >> - you are using a TermQuery which will only do exact matches. Since > >> you're using StringField your text won't be tokenized but stored as > >> is. I see you're searching on a mixed case term - that's fine as long > >> as you don't expect "aaa" to match "AAA". I tend to just downcase > >> everything because I've wasted so much time over the years on silly > >> case sensitive bugs. > >> > >> RAMDirectory instances will disappear when the application ends so > >> yes, you'll need to reload on startup. You don't have to recreate for > >> each search though - create and populate the RAMDirectory on startup > >> and create an IndexSearcher and use that for all searches. > >> > >> Depending on your app it might be easier to use a normal disk based > >> index. It will probably be fast enough. > >> > >> > >> -- > >> Ian. > >> > >> > >> On Tue, Feb 12, 2013 at 1:29 PM, Mohammad Tariq <donta...@gmail.com> > >> wrote: > >> > Hello Ian, > >> > * > >> > * > >> > I started as directed by you and created the index. Here is a > small > >> > piece of code which I have written. Please have a look over it : > >> > * > >> > * > >> > *public static void main(String[] args) throws IOException, > >> ParseException { > >> > * > >> > * * > >> > * //Specify the analyzer for tokenizing text. The same analyzer > should > >> > be used for indexing and searching* > >> > * StandardAnalyzer analyzer = new > >> StandardAnalyzer(Version.LUCENE_40);* > >> > * > >> > * > >> > * // 1. create the index* > >> > * Directory index = new RAMDirectory();* > >> > * IndexWriterConfig config = new > IndexWriterConfig(Version.LUCENE_40, > >> > analyzer);* > >> > * IndexWriter w = new IndexWriter(index, config);* > >> > * Configuration conf = HBaseConfiguration.create();* > >> > * HTable table = new HTable(conf, "mappings");* > >> > * Scan s = new Scan();* > >> > * ResultScanner rs = table.getScanner(s);* > >> > * int count = 0;* > >> > * String[] localnames;* > >> > * for (Result r : rs) {* > >> > * count++;* > >> > * localnames = Bytes.toString(r.getValue(Bytes.toBytes("cf"), > >> > Bytes.toBytes("LOC"))).trim().split(",");* > >> > * for(String str : localnames){* > >> > * addDoc(w, "." + str + ".", > >> Bytes.toString(r.getValue(Bytes.toBytes("cf"), > >> > Bytes.toBytes("CON"))), Bytes.toString(r.getRow()));* > >> > * }* > >> > * }* > >> > * System.out.println("COUNT : " + count);* > >> > * table.close();* > >> > * w.close();* > >> > * * > >> > * // 2. query* > >> > * > >> > * > >> > * String term = "";* > >> > *// BufferedReader br = new BufferedReader(new > >> > InputStreamReader(System.in)); * > >> > *// System.out.println("Enter the term you want to search...");* > >> > *// term = br.readLine();* > >> > * term = "Vacuolated Lymphocytes";* > >> > * TermQuery tq = new TermQuery(new Term("localname", "." + term + > >> "."));* > >> > * > >> > * > >> > * // 3. search* > >> > * int hitsPerPage = 10;* > >> > * IndexReader reader = DirectoryReader.open(index);* > >> > * IndexSearcher searcher = new IndexSearcher(reader);* > >> > * TopScoreDocCollector collector = > >> > TopScoreDocCollector.create(hitsPerPage, true);* > >> > * searcher.search(tq, collector);* > >> > * ScoreDoc[] hits = collector.topDocs().scoreDocs;* > >> > * * > >> > * // 4. display results* > >> > * System.out.println("Found " + hits.length + " hits.");* > >> > * for(int i=0;i<hits.length;++i) {* > >> > * int docId = hits[i].doc;* > >> > * Document d = searcher.doc(docId);* > >> > * System.out.println("ControlID -> " + d.get("controlid") + > "\t" + > >> > "Localnames -> " + d.get("localname") + "\t" + "Controname -> " + > >> > d.get("controlname"));* > >> > * }* > >> > * // reader can only be closed when there* > >> > * // is no need to access the documents any more.* > >> > * reader.close();* > >> > * }* > >> > * > >> > * > >> > * private static void addDoc(IndexWriter w, String local, String > control, > >> > String rowkey) throws IOException {* > >> > * > >> > * > >> > * Document doc = new Document();* > >> > * doc.add(new StringField("localname", local, Field.Store.YES));* > >> > * doc.add(new StringField("controlname", control, Field.Store.YES));* > >> > * doc.add(new StringField("controlid", rowkey, Field.Store.YES)); * > >> > * w.addDocument(doc);* > >> > * }* > >> > * > >> > * > >> > Does it look fine to you? Or can I make it better by adding or > removing > >> > something?Although it shows just a primitive usage of Lucene, it is > >> always > >> > better to have some able guidance with us. > >> > > >> > One more question. Does the index remain alive only till the lifetime > of > >> > the application if we are using *RAMDirectory*? I have to run the > entire > >> > process everytime I want to search something. > >> > > >> > Also, I have added a dot(.) before and after after each word before > >> adding > >> > it to the document so that I can do *exact match search*. Is my > approach > >> > correct or is there any other OOTB feature available in Lucene which I > >> can > >> > use for this? > >> > > >> > I am sorry to be a pest of questions and thank you so much for your > time. > >> > > >> > Warm Regards, > >> > Tariq > >> > https://mtariq.jux.com/ > >> > cloudfront.blogspot.com > >> > > >> > > >> > On Mon, Feb 11, 2013 at 10:09 PM, Mohammad Tariq <donta...@gmail.com> > >> wrote: > >> > > >> >> Hey Ian. Thank you so much for the quick reply. I'll definitely give > >> >> Lucene a shot. I'll start off with it and get back to you in case of > any > >> >> problem. > >> >> > >> >> Many thanks. > >> >> > >> >> Warm Regards, > >> >> Tariq > >> >> https://mtariq.jux.com/ > >> >> cloudfront.blogspot.com > >> >> > >> >> > >> >> On Mon, Feb 11, 2013 at 10:03 PM, Ian Lea <ian....@gmail.com> wrote: > >> >> > >> >>> You can certainly use lucene for this, and it will be blindingly > fast > >> >>> even if you use a disk based index. > >> >>> > >> >>> Just index documents as you've laid it out, with the field you want > to > >> >>> search on added as indexable and the others stored. > >> >>> > >> >>> I've never used Guava Table so can't comment on that, but with only > a > >> >>> few thousand words it would certainly be feasible to use something > >> >>> like that. Better? I don't know. > >> >>> > >> >>> Personally I'd probably go with lucene as I'd be positive it would > a) > >> >>> work and b) be fast even if the thousands ending being tens of > >> >>> thousands, or more. > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> Ian. > >> >>> > >> >>> On Mon, Feb 11, 2013 at 3:14 PM, Mohammad Tariq <donta...@gmail.com > > > >> >>> wrote: > >> >>> > Hello list, > >> >>> > > >> >>> > I have a scenario wherein I need an in-memory index as I > >> need > >> >>> > faster search. The problem goes like this : > >> >>> > > >> >>> > I have a list which contains a couple of thousands words. Each > word > >> has > >> >>> a > >> >>> > corresponding ID and a list of synonyms. The actual word is a > column > >> in > >> >>> my > >> >>> > Hbase table. I get files which contain values for this column and > I > >> >>> have to > >> >>> > extract values from these files and put them into the appropriate > >> >>> column. > >> >>> > But sometimes files may contain the synonym instead of the actual > >> word. > >> >>> > Now, this is the place where index come into picture. I should > have > >> an > >> >>> > index that contains all the words along with its ID and all the > >> synonyms > >> >>> > and it should be in-memory always so that inserts into Hbase are > >> quick. > >> >>> > Something like this : > >> >>> > > >> >>> > ID WORD SYNONYMS > >> >>> > 13991 A a, A, Aa, aa, AA > >> >>> > > >> >>> > Then the index should be something like this : > >> >>> > a A 13991 > >> >>> > A A 13991 > >> >>> > Aa A 13991 > >> >>> > aa A 13991 > >> >>> > AA A 13991 > >> >>> > > >> >>> > So that if I get 'a' in the file, I should be able to do a lookup > and > >> >>> index > >> >>> > should give me 'A' along with '13991'. I need both the base name > and > >> the > >> >>> > ID. The names could even be strings of 4 to 5 words. > >> >>> > > >> >>> > I have all this information stored in a Hbase table having two > >> columns > >> >>> > where the first column contains the actual word and the second > column > >> >>> > contains the entire list of synonyms. And the rowkey is the ID. > >> >>> > > >> >>> > Now. I am not getting whether it is feasible to use Lucene to get > >> this > >> >>> or > >> >>> > should I go with something like 'Guava Table' or something else. > >> Need > >> >>> some > >> >>> > guidance as being new to Lucene I am not able to think in the > right > >> >>> > direction. If it is feasible to use Lucene to achieve this how to > do > >> it > >> >>> > efficiently? > >> >>> > > >> >>> > I am using Hbase filters right now to do the fetch which is > slowing > >> down > >> >>> > the process. > >> >>> > > >> >>> > I am sorry if my questions sound too childish or senseless as I am > >> not > >> >>> very > >> >>> > good at Lucene. Thank you so much for your valuable time. > >> >>> > > >> >>> > Warm Regards, > >> >>> > Tariq > >> >>> > https://mtariq.jux.com/ > >> >>> > cloudfront.blogspot.com > >> >>> > >> >>> > --------------------------------------------------------------------- > >> >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> >>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> >>> > >> >>> > >> >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >