Thanks for the Google link. I wasn't aware of it. Most of it is very intuitive. And most importantly consistent.
On Wed, Aug 27, 2014 at 11:07 AM, Jack Krupansky <j...@basetechnology.com> wrote: > It's not documented, but Google does seem to support trailing wildcard, > but only if the prefix has at least six characters. For shorter prefixes, > it seems to just drop the wildcard. > > Google also uses "*" in quoted phrases to mean a placeholder for any > single term. That's documented. > > See: > https://support.google.com/websearch/answer/136861?hl=en > > It also seems to support "**" in a quoted phrase to mean one or more > arbitrary terms. This isn't documented, but seems to work. > > > -- Jack Krupansky > > -----Original Message----- From: Milind > Sent: Wednesday, August 27, 2014 10:51 AM > To: java-user@lucene.apache.org > Subject: Re: Why does this search fail? > > > Yes. If you search for alphare on google and alphare*, you get 2 different > results. Sorry for the contrived example. I just tried searching for > alpharetta and went backwards deleting characters. > > > On Wed, Aug 27, 2014 at 10:01 AM, Benson Margulies <ben...@basistech.com> > wrote: > > Does google actually support "*"? >> >> >> >> On Wed, Aug 27, 2014 at 9:54 AM, Milind <mili...@gmail.com> wrote: >> >> > I see. This is going to be extremely difficult to explain to end users. >> > It doesn't work as they would expect. Some of the tokenizing rules are >> > already somewhat confusing. Their expectation is that it should work > >> the >> > way their searches work in Google. >> > >> > It's difficult enough to recognize that because the period is surrounded >> by >> > a digit and alphabet (as opposed to 2 digits or 2 alphabets), it gets >> > tokenized. So I'd have expected that C0001.DevNm00* would effectively >> > become a search for C0001 OR DevNm00*. But now, because of the presence >> of >> > the wildcard, it's considered as 1 term and the period is not a >> tokenizer. >> > That's actually good, but now the fact that it's still considered as 2 >> > terms for wildcard searches makes it very unintuitive. I don't suppose >> > that I can do anything about making wildcard search use multiple terms >> > if >> > joined together with a tokenizer. But is there any way that I can force >> it >> > to go through an analyzer prior to doing the search? >> > >> > >> > >> > >> > On Tue, Aug 26, 2014 at 4:21 PM, Jack Krupansky < >> j...@basetechnology.com >> > >> > wrote: >> > >> > > Sorry, but you can only use a wildcard on a single term. >> "C0001.DevNm001" >> > > gets indexed as two terms, "c0001" and "devnm001", so your wildcard >> won't >> > > match any term (at least in this case.) >> > > >> > > Also, if your query term includes a wildcard, it will not be fully >> > > analyzed. Some filters such as lower case are defined as "multi-term", >> so >> > > they will be performed, but the standard tokenizer is not being > > >> called, >> > so >> > > the dot remains and this whole term is treated as one term, unlike the >> > > index analysis. >> > > >> > > -- Jack Krupansky >> > > >> > > -----Original Message----- From: Milind >> > > Sent: Tuesday, August 26, 2014 12:24 PM >> > > To: java-user@lucene.apache.org >> > > Subject: Why does this search fail? >> > > >> > > >> > > I have a field with the value C0001.DevNm001. If I search for >> > > >> > > C0001.DevNm001 --> Get Hit >> > > DevNm00* --> Get Hit >> > > C0001.DevNm00* --> Get No Hit >> > > >> > > The field gets tokenized on the period since it's surrounded by a >> letter >> > > and and a number. The query gets evaluated as a prefix query. I'd >> have >> > > thought that this should have found the document. Any clues on why >> this >> > > doesn't work? >> > > >> > > The full code is below. >> > > >> > > Directory theDirectory = new RAMDirectory(); >> > > Version theVersion = Version.LUCENE_47; >> > > Analyzer theAnalyzer = new StandardAnalyzer(theVersion); >> > > IndexWriterConfig theConfig = >> > > new IndexWriterConfig(theVersion, >> > theAnalyzer); >> > > IndexWriter theWriter = new IndexWriter(theDirectory, >> theConfig); >> > > >> > > String theFieldName = "Name"; >> > > String theFieldValue = "C0001.DevNm001"; >> > > Document theDocument = new Document(); >> > > theDocument.add(new TextField(theFieldName, theFieldValue, >> > > Field.Store.YES)); >> > > theWriter.addDocument(theDocument); >> > > theWriter.close(); >> > > >> > > String theQueryStr = theFieldName + ":C0001.DevNm00*"; >> > > Query theQuery = >> > > new QueryParser(theVersion, theFieldName, >> > > theAnalyzer).parse(theQueryStr); >> > > System.out.println(theQuery.getClass() + ", " + theQuery); >> > > IndexReader theIndexReader = > > DirectoryReader.open( >> theDirectory); >> > > IndexSearcher theSearcher = new IndexSearcher(theIndexReader); >> > > TopScoreDocCollector collector = > > >> TopScoreDocCollector.create(10, >> > > true); >> > > theSearcher.search(theQuery, collector); >> > > ScoreDoc[] theHits = collector.topDocs().scoreDocs; >> > > System.out.println("Hits found: " + theHits.length); >> > > >> > > Output: >> > > >> > > class org.apache.lucene.search.PrefixQuery, Name:c0001.devnm00* >> > > Hits found: 0 >> > > >> > > >> > > -- >> > > Regards >> > > Milind >> > > >> > > --------------------------------------------------------------------- >> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > >> > > >> > >> > >> > -- >> > Regards >> > Milind >> > >> >> > > > -- > Regards > Milind > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Regards Milind