Please forgive me if this is the wrong place to ask, the actual application is using Elasticsearch, but my understanding is that all the actual searches are done at the ES shard level by Lucene. If there is a better place to ask these questions, please let me know.
I'm trying to understand the CPU, memory and I/O costs of queries, which started when I wanted to construct test queries to use to report system response. Is there documentation, a book, or a particular layer of code to look at to get a understanding of these costs? Disclaimer: my background is not in searching, indexing, NLP, I'm a "systems person" who has a broad interest in how parts of a software system interact. The corpus is a series of Elasticsearch indexes, broken up by ES "Index Livetime Management" to limit ES Shard (Lucene index) size. The documents are news articles with a source (domain name, with keyword mapping), an extracted "publication date" (date mapping), and text (keyword mapping). Articles are not necessarily added in publication date order (although I have proposed partitioning the indices by publication year). Queries have three aspects: 1. They always have a date range for publication dates to consider. 2. They almost always have a list of source domains to consider (currently expressed as a query string domain:[this OR that OR ...]) 3. They almost always have a user query string (sometimes omitted to get the overall number of articles to normalize result counts) The first two are applied as Elastic "filters". Do the number of days, number of domains, and number of query terms have equal impact? High level users often construct user queries of the form (all applied to the article text) of: (a OR b OR c ...) AND (d OR e OR f ...) ... At a simple level, how do costs acrue? By simple count of (lower case) terms, or by the product of the sums of OR terms?? Sometimes queries contain wildcards: (a* OR b OR c ...) AND (d* OR e OR f ...) ... Do the wildcard matches simple increase the number of terms, or are there other major costs? Thanks in advance, Phil P.S. https://lucene.apache.org/core/discussion.html points to an IRC channel at freenode.net, but it's been down any time I've tried the link, and the slack channel seems to require an ASF afilliated email. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org