I really agree with robert here. performance is everything here and since we have a fast variant of this query we really don't need the slow one in core. I don't understand why expert users like you Mark B. can't make the distinction in app code between Slow/Fast FuzzyQuery? Even if it goes EOL and we drop it can't your app have it still, its ASL 2.0?
I'd also vote -1 here. simon On Sat, Nov 10, 2012 at 2:32 AM, Robert Muir <[email protected]> wrote: > Its a really simple answer. > > Your problem (and i quote): > Content indexed as state:california > But it seems like I search state:CALIFORNI~0.65 (via solr) it doesn't work. > I'm worried that Solr isn't running my text through the query analyzers > first! > > This is some analysis chain configuration issue. > > We don't need to add support for some unscalable stuff to lucene to > correct for that: you just need to make sure lowercasing is happening. > > NOTE: I will continue to protest/veto/anything i can to block queries > with horrible complexity, making as much noise as possible, because > the end solution is for users to index and search content correctly > and get results in reasonable amount of time. > > If it doesn't work with 100M documents, i don't want it in lucene. > > I would have the same opinion if someone wanted unscalable solutions > for scoring w/ language models (e.g. not happy with smoothing for > unknown probabilities), or if someone claimed that spatial queries > should do slow things because they don't currently support > interplanetary distances, and so on. > > On Fri, Nov 9, 2012 at 7:52 PM, Mark Bennett <[email protected]> wrote: >> Hi Robert, >> >> I acknowledge your "-1" vote, and I'm guessing that your objection is maybe >> 70% "scalability", and only 30% use-case? >> >> The older Levenstein stuff has been around for a long time, scalable or not, >> and already in real systems. >> >> You seem to have a very "binary" on code being "in" or "out". Is there any >> room in your world-view of code for "gray code", unsupported, incubator, >> what-have-you? Maybe analagous to people who jailbreak their iPhones or >> something? >> >> You're an important part of the community, and working at Lucid, etc., and >> clearly concerned about software quality. When smart folks like you have >> such sharp opinions I do try to ponder them against my own circumstances. >> >> And on the quality of the old code, was it just the scalability, or were >> there other concerns such as stability, coding style, or possibly >> inconsistent results? >> >> Isn't the sandbox and admonished reference in Java docs sufficient? >> >> I'm harping on this because I'm really between a rock and hard place, and >> also posted another question. >> >> Just trying to understand your very strong opinions, and I thank you for >> your patience in this matter. This issue is either going to fix or break my >> weekend / next-deliverble. >> >> Sincere thanks, >> Mark >> >> >> -- >> Mark Bennett / New Idea Engineering, Inc. / [email protected] >> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 >> >> >> On Fri, Nov 9, 2012 at 4:37 PM, Robert Muir <[email protected]> wrote: >>> >>> I'm -1 for having unscalable shit in lucene's core. This query should >>> have never been added. >>> >>> I don't care if a few people complain because they aren't using >>> lowercasefilter or some other insanity. Fix your analysis chain. I >>> don't have any sympathy. >>> >>> On Fri, Nov 9, 2012 at 7:35 PM, Jack Krupansky <[email protected]> >>> wrote: >>> > +1 for permitting a choice of fuzzy query implementation. >>> > >>> > I agree that we want a super-fast fuzzy query for simple variations, but >>> > I >>> > also agree that we should have the option to trade off speed for >>> > function. >>> > >>> > But I am also sympathetic to assuring that any core Lucene features be >>> > as >>> > performant as possible. >>> > >>> > Ultimately, if there was a single fuzzy query implementation that did >>> > everything for everybody all of the time, that would be the way to go, >>> > but >>> > if choices need to be made to satisfy competing goals, we should support >>> > going that route. >>> > >>> > -- Jack Krupansky >>> > >>> > From: Mark Bennett >>> > Sent: Friday, November 09, 2012 3:48 PM >>> > To: [email protected] >>> > Subject: Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] >>> > [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast. >>> > >>> > Hi Robert, >>> > >>> > On Thu, Sep 13, 2012 at 7:39 PM, Robert Muir <[email protected]> wrote: >>> >> >>> >> ... >>> >> ... I'm strongly against having this >>> >> unscalable garbage in lucene's core. >>> >> >>> >> There is no use case for ed > 2, thats just crazy. >>> > >>> > >>> > I promise you there ARE use cases for edit distances > 2, especially >>> > with >>> > longer words. Due to NDA I can't go into details. >>> > >>> > Also ed>2 can be useful when COMBINING that low-quality part of the >>> > search >>> > with other sub-queries, or additional business rules. Maybe instead of >>> > boiling an ocean this lets you just boil the sea. ;-) >>> > >>> > I won't comment on the quality of the older Levenstein code, or the >>> > likely >>> > very slow performance, nor where the code should live, etc. >>> > >>> > But your statement about "no use case for ed > 2" is simply not true. >>> > (whether you'd agree with any of them or not is certainly another >>> > matter) >>> > >>> > I understand your concerns about not having it be the default. (or >>> > maybe >>> > having a giant warning message or something, whatever) >>> > >>> >> -- >>> >> lucidworks.com >>> >> >>> >> --------------------------------------------------------------------- >>> >> To unsubscribe, e-mail: [email protected] >>> >> For additional commands, e-mail: [email protected] >>> >> >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
