Does google actually support "*"?
On Wed, Aug 27, 2014 at 9:54 AM, Milind <mili...@gmail.com> wrote: > I see. This is going to be extremely difficult to explain to end users. > It doesn't work as they would expect. Some of the tokenizing rules are > already somewhat confusing. Their expectation is that it should work the > way their searches work in Google. > > It's difficult enough to recognize that because the period is surrounded by > a digit and alphabet (as opposed to 2 digits or 2 alphabets), it gets > tokenized. So I'd have expected that C0001.DevNm00* would effectively > become a search for C0001 OR DevNm00*. But now, because of the presence of > the wildcard, it's considered as 1 term and the period is not a tokenizer. > That's actually good, but now the fact that it's still considered as 2 > terms for wildcard searches makes it very unintuitive. I don't suppose > that I can do anything about making wildcard search use multiple terms if > joined together with a tokenizer. But is there any way that I can force it > to go through an analyzer prior to doing the search? > > > > > On Tue, Aug 26, 2014 at 4:21 PM, Jack Krupansky <j...@basetechnology.com> > wrote: > > > Sorry, but you can only use a wildcard on a single term. "C0001.DevNm001" > > gets indexed as two terms, "c0001" and "devnm001", so your wildcard won't > > match any term (at least in this case.) > > > > Also, if your query term includes a wildcard, it will not be fully > > analyzed. Some filters such as lower case are defined as "multi-term", so > > they will be performed, but the standard tokenizer is not being called, > so > > the dot remains and this whole term is treated as one term, unlike the > > index analysis. > > > > -- Jack Krupansky > > > > -----Original Message----- From: Milind > > Sent: Tuesday, August 26, 2014 12:24 PM > > To: java-user@lucene.apache.org > > Subject: Why does this search fail? > > > > > > I have a field with the value C0001.DevNm001. If I search for > > > > C0001.DevNm001 --> Get Hit > > DevNm00* --> Get Hit > > C0001.DevNm00* --> Get No Hit > > > > The field gets tokenized on the period since it's surrounded by a letter > > and and a number. The query gets evaluated as a prefix query. I'd have > > thought that this should have found the document. Any clues on why this > > doesn't work? > > > > The full code is below. > > > > Directory theDirectory = new RAMDirectory(); > > Version theVersion = Version.LUCENE_47; > > Analyzer theAnalyzer = new StandardAnalyzer(theVersion); > > IndexWriterConfig theConfig = > > new IndexWriterConfig(theVersion, > theAnalyzer); > > IndexWriter theWriter = new IndexWriter(theDirectory, theConfig); > > > > String theFieldName = "Name"; > > String theFieldValue = "C0001.DevNm001"; > > Document theDocument = new Document(); > > theDocument.add(new TextField(theFieldName, theFieldValue, > > Field.Store.YES)); > > theWriter.addDocument(theDocument); > > theWriter.close(); > > > > String theQueryStr = theFieldName + ":C0001.DevNm00*"; > > Query theQuery = > > new QueryParser(theVersion, theFieldName, > > theAnalyzer).parse(theQueryStr); > > System.out.println(theQuery.getClass() + ", " + theQuery); > > IndexReader theIndexReader = DirectoryReader.open(theDirectory); > > IndexSearcher theSearcher = new IndexSearcher(theIndexReader); > > TopScoreDocCollector collector = TopScoreDocCollector.create(10, > > true); > > theSearcher.search(theQuery, collector); > > ScoreDoc[] theHits = collector.topDocs().scoreDocs; > > System.out.println("Hits found: " + theHits.length); > > > > Output: > > > > class org.apache.lucene.search.PrefixQuery, Name:c0001.devnm00* > > Hits found: 0 > > > > > > -- > > Regards > > Milind > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > -- > Regards > Milind >