Doug, What you say makes a good deal of sense to me. Could you give us a relative sense of the "slowness" of different operators?
Regards Terry ----- Original Message ----- From: "Doug Cutting" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Tuesday, February 17, 2004 1:16 PM Subject: Re: SubstringQuery -- Re: Leading Wild Card Search > David Spencer wrote: > > 2 files attached, SubstringQuery (which you'll use) and > > SubstringTermEnum ( used by the former to be > > consistent w/ other Query code). > > > > I find this kind of query useful to have and think that the query parser > > should allow it in spite of the perception > > of this being slow, however I think the debate is the "user centric > > view" (say mine, allow substring queries) > > vs the "protect the engines performance" view which says not to allow > > expensive queries. > > I think the argument is more complex. > > One issue is cost of execution: very slow queries can be used to > implement a denial-of-service attack. Maybe that's an overstatement, > but in a web server setting, once a few of slow searches are running, no > others may complete. When folks hit "Stop" in their browser the server > does not stop processing the query. If they hit "Reload" then another > new search is started. So these can be very problematic. This is real. > Lots of folks have deployed Lucene with large indexes and then found > that their server randomly crashes. Closer scrutiny shows that they > were permitting operators that are too slow for their combination of > index size and query traffic. The BooleanQuery.TooManyClauses exception > was added to address this, but it can still be too late, if the problem > is caused before the query is built, e.g., while enumerating all terms. > > A releated issue is that users (and even most developers) don't > understand the relative costs of different query operators. Some things > are fast, others are surprisingly slow. That's not a great user > experience, and triggers problems like those described above. People > think that the rare slow cases are network problems or something, and > hit "Reload". > > I have no problem with including slow operators with Lucene, but they > should be well documented as such, at least for developers. Perhaps we > should make a pass through the existing Query classes, in particular > those which expand into other queries, and add some performance notes, > so that folks don't blindly start using things which may bite them. By > default I think it would be safest if the QueryParser only permitted > operators which are efficient. Folks can then, at their own risk, > enable other operators. > > In summary, removing operators can be user-centric, if it removes > unpredictablity. And the reason for protecting engine performance is > not miserly, it's to guarantee availablility. And finally, an issue > dear to me, a predicatble search engine results in fewer spurious bug > reports, saving developer time for real bugs. > > Doug > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
