Sigh. Yeah, I agree that a simple big-O won't work for Lucene. But nonetheless, we really should have some sort of performance characterization. When people ask me about how to characterize Lucene/Solr performance I always tell them that it is highly non-linear, with lots of optimizations and options (tokenizers, stemming, case, n-grams, numeric fields) and highly sensitive to the specifics of the data, so that estimating performance or memory requirements is impractical. I mean, most people don't have a handle on cardinality, actual data size, actual document term counts, or data distribution, so even if we had an accurate performance model most people wouldn't have accurate numbers to feed into the model, especially since a lot of use cases involve data in the future that nobody has seen yet. The average manager thinks they are on top of performance and memory requirements when they can tell you how many raw files and how many giga/tera-bytes of data they have, which clearly won't feed into any sane model of Lucene performance.
Ultimately the best we can do is fall back on the model of doing a proof of concept implementation and actually measuring performance and memory for a significant sample of realistic data and then you can empirically deduce who the big-O function is for your particular application data and data model. -- Jack Krupansky On Fri, Nov 20, 2015 at 4:38 AM, Adrien Grand <jpou...@gmail.com> wrote: > I don't think the big-O notation is appropriate to measure the cost of > Lucene queries. > > Le mer. 11 nov. 2015 à 20:31, search engine <searchstudy1...@gmail.com> a > écrit : > > > Hi, > > > > I've been thinking how to use big O annotation to show complexity for > > different types of queries, like term query, prefix query, phrase query, > > wild card and fuzzy query. Any ideas? > > > > thanks, > > Zong > > >